What gives a stroke publication impact? Assessing traditional and alternative metrics of scientific impact for papers published in the journal Stroke [version 3; peer review: 1 approved, 1 not approved]

Background : The ‘impact’ of a scientific paper is a measure of influence in its field. In recent years, traditional, citation-based measures of impact have been complemented by Altmetrics, which quantify outputs including social media footprint. As authors and research institutions seek to increase their visibility both within and beyond the academic community, it is important to identify and compare the determinants of traditional and alternative metrics. We explored this using Stroke – a leading journal in its field. Methods : We described the impact of original research papers published in Stroke (2015-2016) using citation count and Altmetric Attention Score (Altmetrics). Using these two metrics as our outcomes, we assessed univariable and multivariable associations with 21 plausibly relevant publication features. We set the significance threshold at p<0.01. Results : Across 911 papers published in Stroke , there was an average citation count of 21.60 (±17.40) and Altmetric score of 17.99 (±47.37). The two impact measures were weakly correlated (r=0.15, p<0.001). Citations were independently associated with five publication features at a significance level of p<0.01: Time Since Publication (beta=0.87), Number of Authors (beta=0.22), Publication Type (beta=6.76), Number of content and format may contribute to impact, but these differ for traditional measures and Altmetrics, and explain only a very modest proportion of variance in the latter. Citation counts and Altmetrics seem to represent different constructs and, therefore, should be used in conjunction to allow a more comprehensive assessment of publication impact. This paper presents an analysis of the factors that contribute to the impact (as measured by citations and Altmetric Attention Score) of papers published in the journal Stroke in the years 2015 and 2016. The analyzed time period was justified but the choice of the journal should be motivated in the introduction. I have several reservations about this paper as described in the following: assessment of publication


Introduction
Recent decades have seen numerous evidence-based improvements in stroke care. These changes have been driven, at least in part, by high impact stroke research papers. Stroke (American Heart Association) is the highest ranked clinical journal in stroke research 1 and for 50 years has published important research that incorporates a variety of research techniques relating to a wide spectrum of stroke pathology and management. Using Stroke as an exemplar clinical journal, we thought it would serve as a useful substrate to assess the impact of published articles. We felt Stroke provided suitable substrate for our review as it publishes a wide range of clinical science, generally has good citation and Altmetric ratings, but represents a single topic journal.
The 'impact' of a scientific paper is a measure of the influence the paper has in its field. Impact is increasingly important not only to authors, but also research institutions and funders. There is no consensus definition or standardized measure of impact 2 . A traditional approach to quantifying impact of a research paper would be to assess a surrogate measure such as the number of citations a paper receives 3-5 . However, for many reasons this is an imperfect measure of impact as there is often substantial lag time between publication and subsequent citations in other papers. Citation does not imply that the results of the paper have been read and understood, and citations are a poor measure of impact beyond the academic community.
Alternative metrics or 'Altmetrics', are contemporary measures of impact that are becoming increasingly popular in the age of social media 6 . Altmetric Attention Scores are a measure of a publication's visibility across various online platforms. Thus, Altmetrics offer an immediate and real-time updated, continuous score calculated using a weighted points based algorithm 7 . For example, a paper would obtain eight Altmetric points if cited by a news outlet, five if featured in a blog and one if mentioned on Twitter 8 .
In certain scientific disciplines it has been shown that various features of a paper predict citation count , but there has been limited research describing which factors influence Altmetrics 5,[9][10][11][12][13][14][15][16][17][18][19] . However, there has been limited research describing which factors influence Altmetrics. A greater understanding of impact within the field of stroke, that takes account of social media and other avenues of communication would help researchers, funders and faculty making appointments.
Our aim was to describe impact and features associated with traditional and Altmetric based measures for research papers published in Stroke.

Methods
Our data collection and analysis followed a preregistered protocol, protocol and data are available online (https://www. researchgate.net/project/What-Gives-a-Paper-Impact).
All aspects of searching, data extraction and analysis were performed by a single trained researcher (LW) with additional input for internal validation as described below. This data driven analysis did not require ethical approval, all data are in the public domain and collated data are included in this paper.
Synopses, Letters to the Editor, Editorials, Guidelines and Study Protocols. The publication dates were chosen to allow papers sufficient time to acquire citations and achieve impact, while recognizing that Altmetrics are a new measure and require a relatively contemporary dataset 8 .

Impact measures and associations
We used number of citations and Altmetric Attention Score as measures of impact for each paper studied.
Altmetric Attention Score was available from the publisher and analysed both as a linear measure and dichotomised into 'high' and 'low' using the mean Altmetric Score of the dataset as the threshold.
We described number of citations for each paper using the Google Scholar online resource. Data were collated at the individual paper level and collated in a bespoke data extraction template. As data collection took several months (January to April 2019) we censored citation count at December 2018. This avoided any time bias of more citations accruing for those papers assessed at the end of the study period.
We chose 21 different features of publications and investigated their association with impact of Stroke papers. Features were selected if previously shown to be associated with impact in published literature and if data were available and common to all Stroke publications. We performed external searches to find details not immediately available in the primary paper, for example, for lead authors we searched for them in their host institution or via links to their pubic facing research accounts such as ORCID or ResearchGate. Full details of the features and their definitions are described in Table 1.
Select features are described below: Title Type was classified as: indicative (informs about the subject and outcome of the paper), descriptive (describes the paper's subject without revealing its main outcome) or interrogative (title is in the form of a question).

Time Since Publication
Linear --Number of months since paper was first published.

Title Length
Linear --Number of words in main title.

No. of Key Words
Linear --Number of key words listed.

Title Type Categorical
Indicative 678 Informs about both the subject of the paper and its main outcomes and conclusion.

Descriptive 218
Describes the subject of the paper without revealing its main outcome or conclusion.

Interrogative 15
In the form of a question, which answering is the focus of the paper.

Categorical
Yes 348 All authors are affiliated with at least two different countries.

No 563
All authors are affiliated with the same country.

Collective Categorical
Yes 79 Study conducted on behalf of an institution.
No 832 Study conducted independent of an institution.

Publication Type Categorical
Full paper 720 Full length publication with a maximum of 5000 words.
Brief Repot 191 Papers presenting less developed datasets than appropriate for a full paper, with a maximum of 2000 words.

Categorical
Basic science 122 Animal experiments, cell studies, biochemical, genetic, physiological investigations and studies of drugs.

Interventional Study 80
Designed to evaluate the effect of an introduced intervention (either therapeutic or preventive) on healthrelated outcomes, where participants are assigned by the researchers to particular study groups.
Observational Study 666 Focuses on the observation of incidents and/or relationships between factors and outcomes, where researchers do not have control over investigated independent variables.
Review 80 Aims to summarise evidence from primary research related to a specific topic; can either be systematic or narrative/ simple; includes meta-analysis.

Sample Size
Linear --Total number of subjects who participated in the study. For follow-up studies: number of participants at baseline.

Number of Tables
Linear --Number of tables used throughout the paper.

Number of Figures
Linear --Number of figures used throughout the paper.

Study Outcome Categorical
Positive findings 895 Author(s) report on a significant effect or trend of variables of interest, indicating group differences, associations, or properties of an intervention or diagnostic/ prognostic tool.

Neutral Findings 16
Author(s) report that results do not indicate any effect of variables of interest.

Number of References
Linear --Number of references cited.

Editorial Categorical
Yes 26 The study received its own editorial in Stroke.

No 885
The study did not receive its own editorial in Stroke. Category of Paper was determined by the subject matter. Stroke lists twelve categories, but due to overlap, we combined the categories of 'Health Policy/ Outcomes Research' and 'Prevention and Health Services Delivery' and also combined 'Imaging' and 'Interventional Radiology', giving ten categories.

Funding
Country of Affiliation was based on corresponding author details and was categorised into English speaking and non-English speaking according to native language.
Institution Ranking was based on corresponding author details and used the Academic Ranking of World Universities table for the year of publication 20 .
Study Outcome was based on interpretation of data offered in the paper and was divided into Positive Findings (authors report on a significant effect of variables of interest) or Neutral Findings (authors report that results do not indicate any effect of variables of interest).
Nature of Research describes the nature of the study and was categorised into either Basic Science (animal experiments, cellular, biochemical or genetic studies), Interventional (evaluates the effect of an introduced intervention on health-related outcomes), Observational (observation of incidents or relationships between factors and outcomes) and Review (summarises the evidence from primary research related to a specific topic, including meta-analyses).
Internal Validation: The first month of data from each journal was extracted and categorized in duplicate with a senior researcher in real time (TQ). As a further validation step, a researcher masked to previous data extraction (DD) cross-checked 10% of the data. As a final validation, the main researcher (LW) re-reviewed the initial six months of data. Any discrepancies were resolved through consensus, with the senior author (TQ) making final decisions.

Analyses
We assessed the correlation between the two measures of impact. We used Pearson correlation coefficients, analysis of variance or chi-square as appropriate to investigate the associations between each publication feature and the measures of impact. For Category of Paper, we described the relative impact of each category using Population Studies as a reference.
We used multivariable regression analyses with stepwise backward elimination for each measure of impact, including all variables in the initial model. For these models we set significance at the conservative threshold of p<0.01. We present the unstandardized coefficients. We conducted binomial logistic regression on Altmetric scores. Scores were dichotomised into 'high' or 'low' based on the mean score of the dataset. The logistic regression model comprised of six variables which significantly explained variance in 'high' Altmetric Scores: Time Since Publication, Title Length, Number of References, Category of Paper, Country of Affiliation and Nature of Research.
Where no Altmetric data were available, the paper was not included in the analysis.
At an estimate of requiring more than 20 papers per variable in order to ensure sufficient statistical power of the regression model, with 21 potential variables we comfortably had sufficient sample size for our analyses. All analyses were conducted using Minitab 18 and SPSS (Version 24, IBM).

Validation
A trained researcher (DD), working independent of the main researcher cross-checked 10% of papers and differences were noted and discussed. Cross comparison of blinded review of n=91 (10%) of included papers, resulted in n=8 (8.8%) differences in scoring. Cross comparison of first six months of data n=242 (26.6%), resulted in n=39 (16.1%) differences in scoring.

Altmetric and Citation data
There were 911 research papers in Stroke between 2015 and 2016, of which, 889 had Altmetric Scores available (Table 2). Papers with no Altmetric Scores available are described in Table 3. These papers comprised 2% of the sample (n=22).
Number of citations ranged from 0-150 while Altmetric Attention Score ranged from 1-503. A positive but weak correlation was observed between the two measures of impact ( Figure 1).
Univariable associations between each publication feature and measure of impact are described in Table 4 The Time Since Publication factor had a negative correlation with Altmetric Score but a positive correlation with number of citations. Within the Nature of Research category, Review Papers were associated with the highest numbers of citations, but there was no association with Altmetric Score. Within Category of Paper, no category had significantly higher citations or Altmetric Score compared to our reference of Population Studies. We present these data visually to allow comparisons ( Figure 2). In this Figure we present citations and Altmetric scores, with reference to a comparator (Population studies). Here a negative value implies that the category of paper receives on average less citations or lower Altmetric score, while higher values suggest the opposite.
On multivariable linear regression ( As a further analyses, Altmetric scores were dichotomised into 'high' or 'low' based on the mean score of the dataset. In this instance, mean Altmetric Score was 17.99, scores above this value were deemed 'high' (n = 178), whilst scores below this were 'low' (n = 711). Results of logistic regression models with a dichotomized Altmetric outcome are given in Table 6. We also offer a data visualization in Figure 2 to describe the odds of certain categories of paper attaining a 'high' Altmetric score when compared to a reference of papers in the 'Population Studies' category. Thus, studies in Translational Medicine and Vascular Neurosurgery were 89.5% and 83.4% were less likely to achieve a 'high' Altmetric Score than Population Studies. Studies in Genetics, Imaging and Interventional Radiology and Emerging Therapies were also 86.5%, 63% and 67% less likely to achieve a 'high' Altmetric Score than   Population Studies. Studies in Genetics, Imaging and Interventional Radiology and Emerging Therapies were also 86.5%, 63% and 67% less likely to achieve a 'high' Altmetric Score than Population Studies Figure 3.

Discussion
Our analysis confirms that papers published in Stroke have impact, whether measured using traditional metrics or newer Altmetrics. There is no agreement on what constitutes a 'high'  citation count and measures will vary according to year of publication and topic. Altmetric Scores for Stroke papers ranged to greater than 500; however, there is no consensus value for a high Altmetric score, although previous data suggests that a score of 20 or more is relatively high 21 .
There is no gold standard for assessing impact. We used two metrics that represent traditional and alternative approaches. These measures are used when assessing impact, but they are not the only plausible metrics. Other potential markers of impact would include factors such as inclusion in guidelines or training materials. Our finding that citations and Altmetrics were only weakly correlated with one another is consistent with previous research 22 . The factors which contribute to a high citation count and high Altmetrics were distinct. For example, the association of Time Since Publication was seen for both measures of impact but worked in opposite directions,  Altmetrics rise immediately after publication whereas it takes time to accrue citations. Our findings would suggest that these two impact metrics, while related, are measuring different constructs.
Our analyses of factors associated with impact give potential authors few clues on how to improve the impact of their papers submitted to Stroke. We studied factors previously shown to be associated with citations or which were plausibly associated with impact. That some of these factors did not show association in our data and for others the association was modest is a reminder that bibliometrics is dynamic and context specific. It should not be assumed that patterns seen in publications from one branch of science at a particular time will also be apparent in a differing discipline or time. The low explanation variance, particularly for Altmetric scores, suggests that other factors, unmeasured in our study, are contributing to impact.
For papers published in Stroke, our chosen factors explained only a modest amount of the variation in citation count and explained even less of the variation in Altmetrics. Certain types of research seemed to generate reduced impact, in general pre-clinical research papers had both lower Altmetrics and fewer citations. One could argue that these results reflect the predominant clinical focus and readership of the journals rather than a lesser impact of pre-clinical research. Previous literature has suggested that papers with higher citation counts are written by authors who are affiliated with Englishspeaking countries 12 , have longer title lengths 10 and cite more references 19 . However, we found no association between these features and citation counts. Based on our results it does not seem possible to 'game' the system to achieve higher impact from a Stroke paper.
Large population based studies seemed to attract greater attention online from news outlets, blogs and tweets. For example, the study in Stroke with the highest Altmetric score of 503 concluded that female smokers are at a higher risk of subarachnoid haemorrhage 23 , whilst another high scoring study (Altmetric Attention Score: 421) found that lower fitness levels in middle age confer an increased stroke risk 24 . The impressive Altmetrics were not reflected in the number of citations (21 and 9, respectively). Only one study, which reported the improvement of clinical outcomes following stem-cell transplant in stroke patients 25 , acquired both a large number of citations (150) and one of the highest Altmetric Scores of 450. When interpreting these Altmetrics, one must remember that a large social media footprint is not synonymous with study quality 26 .
In fact, a controversial study with questionable method may attract considerable social media attention for all the wrong reasons. For these reasons, and others, we recognize that the use of Altmetrics, and indeed any quantitative measures, are controversial methods of describing impact. Our purpose was not to promote Altmetrics, but rather we recognize that these measures are increasingly used by funders and promotion committees. Other potential measures of impact are available, one example is the use of resources such as Mendeley.
With no consensus on the optimal approach, we majored on Altmetrics as our preferred measure, as this is the best recognized of the various tools. Traditional metrics such as citation rates are also not without limitations 27 and we would caution against using either measure in isolation to describe the complex concept of impact.
Our study has various strengths. We assessed an international, clinical stroke journal using accepted measures of impact. We worked within in a time-frame giving studies enough time to acquire citations, whilst also being sufficiently recent to allow meaningful Altmetric scoring. We worked to a pre-defined protocol and embedded a series of internal validation steps.
There were limitations to our data extraction. Altmetric Attention Scores were missing for 22 papers. Giving these studies an Altmetric Score of 0 may not have reflected their true online visibility as it was unclear whether these studies had achieved no impact or the Altmetric Score was missing for another reason. Some of our chosen features of interest were poorly reported resulting in missing data, for example sample size was often unclear in Basic Sciences papers. We used the Google search engine to quantify citations. We recognize that differing bibliometric data sources may have differing coverage.

Conclusion
We have shown that research published in Stroke has impact regardless of how impact is being measured. It is not possible to predict impact based solely on the features and format of the paper. Factors unique to each paper such as the quality of science and novelty of approach are likely to be more important. Authors, readers and journal editors should consider both traditional and Altmetric Score measures when assessing impact. We suggest that Altmetrics could be used to complement traditional metrics to allow a broader understanding of a paper's impact, not only within the scientific community but also in the public domain. The data contained in this paper offer a yardstick for describing impact of clinical stroke research against which other papers could be compared.

Data availability
All data underlying the results are available as part of the article and no additional source data was required.

Robin Haunschild
Max Planck Institute for Solid State Research, Heisenbergstr, Stuttgart, 70569, Germany I do not see support in the presented data that the Altmetric Attention Score measures any useful aspect of a publication's quality. Thus, I still do not see support for the conclusion that citation counts and Altmetric Attention Scores "should be used in conjunction to allow a more comprehensive assessment of publication impact".
I wonder why F1000Prime scores were not used in the analysis. The author's reply why Google Scholar was used instead of an established citation database for retrieving citation counts is not helpful. Although there is no gold standard resource, established citation databases should be preferred over search engines if there is no good reason for using the search engine. For example low coverage of the journal in established citation databases would a good reason. Is this the case?
The low explained variance by the Altmetric Attention Score is explained by the authors with the hypothesis that this "suggests that other factors, unmeasured in our study, are contributing to impact." I do not see how one can jump to the hypothesis that impact is involved at all? Maybe, the Altmetric Attention Score mainly shows something between short-lived attention and buzz on the web.
It is good to put Altmetric Attention Scores into context. I do not see if using a value that was determined across the full dataset by the company Altmetric.com is fair. The details page of Altmetric.com also provides an assessment "compared to outputs of the same age and source". Such a measure would be fairer in my opinion.
Now, I do understand Figure 2, but I suggest modification of the y axis label to make it easier for readers. The label should indicate that it is a difference in impact measurements. Preferably, the precise difference (between which measures) should be stated in the y axis label or figure caption.
I do not see a helpful comment in the authors' replies regarding comments 15 and 16. Why was the odds ratio analysis not performed for citation counts? It might be interesting to see both odds ratio analyses.
It is common practice for altmetrics studies (especially when the Altmetric Attention Score is used) that missing altmetrics values are replaced by zeros. The reason is simple: Altmetric.com uses Twitter and other resources for discovering publications. Thus, it is unlikely that they miss publications that were mentioned on these primary resources. Therefore, publications not known to Altmetric.com should be counted with a zero Altmetric Attention Score, also zero scores for the other sources that are included in the Altmetric Attention Score. Leaving them out of the analysis might distort the results depending on the sample size. This study includes 911 publications. The 22 excluded publications comprise only 2.4%. However, the proportion of excluded papers might be much larger in the analyzed subgroups.

Competing Interests:
No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
Author Response 08 Oct 2022 Li Siang Wong, Royal Alexandra Hospital, Paisley, Glasgow, UK We thank both peer reviewers for their detailed and insightful feedback. We are encouraged that the first reviewer felt our manuscript was suitable for publication.
The second reviewer has some remaining suggestions that we have responded to, and we hope the paper is further improved. The topic of scientific 'impact' can provoke strong opinion. While some aspects of our analysis could have been performed differently, we are unable to change the design and conduct of the analysis at this stage. We would hope that we offer a comprehensive and transparent report of our method and results. We would welcome any debate it provokes once it is formally published.
We provide a point by point response in the comments below and have shared a manuscript with tracked changes to highlight where we have amended text.

Robin Haunschild
Comment 1: I do not see support in the presented data that the Altmetric Attention Score measures any useful aspect of a publication's quality. Thus, I still do not see support for the conclusion that citation counts and Altmetric Attention Scores "should be used in conjunction to allow a more comprehensive assessment of publication impact".

Answer 1:
We agree that Altmetric Attention scores do not reflect a publication's quality, as stated in the discussion: 'when interpreting these Altmetrics, one must remember that a large social media footprint is not synonymous with study quality'. We did not assess the quality of publications in Stroke, we assessed their impact, a separate entity to quality. Our conclusion is that Altmetric Attention Scores 'should be used in conjunction to allow a more comprehensive assessment of a publication's impact' recognizes the various dimensions of impact. Altmetrics offer a measure that complements the information from citation rates.
Comment 2: I wonder why F1000Prime scores were not used in the analysis. The author's reply why Google Scholar was used instead of an established citation database for retrieving citation counts is not helpful. Although there is no gold standard resource, established citation databases should be preferred over search engines if there is no good reason for using the search engine. For example low coverage of the journal in established citation databases would a good reason. Is this the case?
Answer 2: We agree that F100Prime scores could have been analysed alongside citations and Altmetrics. Google Scholar was chosen as its widely accessible, instant and provides easily extractable data. We recognise that there is no gold standard and there are various other citation databases which could have been used. We have already added this to the limitations section.

Comment 3:
The low explained variance by the Altmetric Attention Score is explained by the authors with the hypothesis that this "suggests that other factors, unmeasured in our study, are contributing to impact." I do not see how one can jump to the hypothesis that impact is involved at all? Maybe, the Altmetric Attention Score mainly shows something between short-lived attention and buzz on the web.

Answer 3:
We apologise if the intention was not clear. We are taking Altmetric score as a measure of 'impact'. We accept that impact has no consensus definition and may well include 'short-lived attention and buzz on the web'. Altmetrics are now widely used in various scientific journals as a way of quantifying a publication's online visibility. We agree that low variance suggests factors which affect a publication's Altmetric score are different to those which affect a paper's traditional impact metrics, as discussed in the paper.

Comment 4:
It is good to put Altmetric Attention Scores into context. I do not see if using a value that was determined across the full dataset by the company Altmetric.com is fair. The details page of Altmetric.com also provides an assessment "compared to outputs of the same age and source". Such a measure would be fairer in my opinion.

Answer 4:
We agree that Altmetric scores should be viewed in context. As mentioned in the discussion: 'there is no consensus value for a high Altmetric score, although previous data suggests that a score of 20 or more is relatively high', as stated on the Stroke website. The tab 'outputs of similar age from Stroke' gives a ranking of that publication. We agree that this would be another way of analysing the Altmetric scores in terms of ranking amongst Stroke publications from that year. However, since we did not include all publications from Stroke in each year (editorials etc were excluded), it would be unfair to use as a comparison.
Comment 5: Now, I do understand Figure 2, but I suggest modification of the y axis label to make it easier for readers. The label should indicate that it is a difference in impact measurements. Preferably, the precise difference (between which measures) should be stated in the y axis label or figure caption. This is further supplemented by the following text: "In this Figure we present citations and Altmetric scores, with reference to a comparator (Population studies), here a negative value implies that the category of paper receives on average less citations or lower Altmetric score, while higher values suggest the opposite".
Comment 6: I do not see a helpful comment in the authors' replies regarding comments 15 and 16. Why was the odds ratio analysis not performed for citation counts? It might be interesting to see both odds ratio analyses.

Answer 6:
We conducted the logistic regression to make the results more interpretable to the reader, with our main analysis being the linear regression. We only did this for Altmetric scores as we wanted to investigate the factors associated with high Altmetric scores. We agree that conducting logistic regression for citation counts would allow further comparison between factors affecting Altmetric and citation counts.
Comment 7: It is common practice for altmetrics studies (especially when the Altmetric Attention Score is used) that missing altmetrics values are replaced by zeros. The reason is simple: Altmetric.com uses Twitter and other resources for discovering publications. Thus, it is unlikely that they miss publications that were mentioned on these primary resources. Therefore, publications not known to Altmetric.com should be counted with a zero Altmetric Attention Score, also zero scores for the other sources that are included in the Altmetric Attention Score. Leaving them out of the analysis might distort the results depending on the sample size. This study includes 911 publications. The 22 excluded publications comprise only 2.4%. However, the proportion of excluded papers might be much larger in the analysed subgroups.
Answer 7: It may be unlikely that publications were missed by Altmetrics, however we could not find any explanation as to why these papers did not have a score displayed on the Stroke website. It states on the Altmetric website that a score of 0 indicates a publication has not received any online attention. However, the excluded publications did not have an Altmetric score of 0 (as far as we could tell), there was no score available. Given there was no discernible reason for this, we excluded these papers. These excluded papers 22/911 (2.4%) are unlikely to distort the results. Their features are described in table 3 and are very similar to those described in table 2 (included papers). We therefore do not think excluding these papers had an impact on our analyses.
Reviewer Report 12 January 2022

Robin Haunschild
Max Planck Institute for Solid State Research, Heisenbergstr, Stuttgart, 70569, Germany This paper presents an analysis of the factors that contribute to the impact (as measured by citations and Altmetric Attention Score) of papers published in the journal Stroke in the years 2015 and 2016. The analyzed time period was justified but the choice of the journal should be motivated in the introduction. I have several reservations about this paper as described in the following: One of the authors' conclusions is: "Citation counts and Altmetrics seem to represent different constructs and, therefore, should be used in conjunction to allow a more comprehensive assessment of publication impact." It is unclear if altmetrics as a whole represent a useful construct for assessment of publication impact. Such a recommendation is highly problematic. I also do not see support in the data and analysis for such a recommendation.
The authors seem to assume that altmetrics represent some kind of useful impact metric. As far as we know, this is not the case for most of them. Some valuable impact attribution can be made to Mendeley reader counts. However, Mendeley reader counts are not included in the Altmetric Attention Score.
The Altmetric Attention Score has been taken from the publisher of the journal. I assume the publisher did not count the altmetrics events. The Altmetric Attention Score is calculated by Altmetric.com. Thus, I assume that the original source of this score was Altmetric.com. Citations were taken from Google Scholar. There are some problems associated with citations from Google Scholar. Not much is known about the coverage of Google Scholar, but it is known that Google Scholar sometimes confuses blogs with journals and the coverage of Google Scholar is fluctuating. Why was no established bibliometric data source used? Considering the fact that the journal Stroke is a medical journal, it would have been interesting to analyze F1000Prime scores along with citations and altmetrics.
In various instances, altmetrics are described in a problematic manner, e.g., (i) in the abstract: "... Altmetrics, which quantify social media footprint." (altmetrics capture more than just social media footprint) and (ii) page 3: "Alternative metrics or 'Altmetrics', are contemporary measures of impact ... ." This implies that citations are outdated measures of impact. I disagree.
The sentence with reference 2 starts with "currently" whereas reference 2 is from 2009. Claims about the current status should not be referenced with papers that are more than ten years old.
In the next-to-last paragraph of the introduction, a single sentence is backed-up with references 5 and 9-19. Are that many references necessary to back-up this sentence? I think the references should be discussed in a bit more detail in such cases. The sentence following references 5 and 9-19, however, is not backed-up by any reference: "However, there has been limited research describing which factors influence Altmetrics." If there has been some research regarding this, some of it should be cited.
The second paragraph of the methods section states: "All aspects of searching, data extraction and analysis were performed by a single trained researcher (LW) with additional input for internal validation as described below." Somehow, this sentence implies universal training of LW. I am unsure if this sentence is properly located in the methods section as it partly sounds like an incomplete statement of contributions to the work. The following sentence states that "all data are in the public domain and collated data are included in this paper." I am unsure if Google Scholar citations an Altmetric Attention Score are data in the public domain. The authors also state in the data availability section: "All data underlying the results are available as part of the article and no additional source data was required." I did not find the data.
This very short paragraph of the sub-section "validation" in the section "results" should be expanded.
I am confused by tables 2 and 3: The sample size has Q3 and maximum values higher than the number of analyzed papers. How were the samples constructed, or how was the sample size defined?
This final two paragraphs state that 21% of the variance in citations and only 3% of the variance in Altmetric Attention Score are explained. I doubt that the results based on that are useful.
The first paragraph of the discussion section states: "There is no agreement on what constitutes a 'high' citation count and measures will vary according to year of publication and topic. Altmetric Scores for Stroke papers ranged to greater than 500; however, there is no consensus value for a high Altmetric score, although previous data suggests that a score of 20 or more is relatively high 21." Usually, the top-X% (X is often chosen as 10) are used as a threshold for a high citation count for publication years and fields, sometimes also document types. Also the Altmetric Attention Score depends on the field, possibly also on time. Actually, the article detail page of Altmetric.com provides a placement within the top-X%, see for example: https://www.altmetric.com/details/13596962 I struggle to understand Figure 2: the y axis has the label "measure of impact" and the y axis contains negative values. The calculation of this "measure of impact" should be explained better.
The first paragraph on page 10 below Figure 3 states: "Certain types of research seemed to generate reduced impact, in general pre-clinical research papers had both lower Altmetrics and fewer citations. One could argue that these results reflect the predominant clinical focus and readership of the journals rather than a lesser impact of pre-clinical research." In this context, the paper "Citation Analysis May  The methodology behind Table 6 and Figure 3 should be explained in much more detail. What are "Population Studies" in this case? Why was this odds ratio analysis performed only for the Altmetrics Attention Score and not for citation counts?
The last paragraph of the discussion debates the proper handling of missing values of the Altmetric Attention Score. The proper handling should be a score of zero. It is very unlikely that Altmetric.com missed a mention of a paper in one of their main data sources such as Twitter. One can speculate about many other reasons. Those would in a similar way also apply to citations, e.g., citations are missing because of incomplete citation linking, and we still use citation counts of zero.
Overall, there are too many problems with this paper in this version so that I can't endorse it.
measure that is increasingly sued by funders and Universities. Since multiple journals, including all of the AHA journals and AMRC itself, have started publishing the Altmetric scores of papers, we thought it would be useful to analyse the factors which may influence this. Our conclusion states that Altmetrics should not be used to assess a publication's impact by itself, and should be used in conjunction with other, more traditional methods such as citation counts, to look at a paper's impact. We have added more detail on this aspect to the Discussion.

Comment 3:
The authors seem to assume that Altmetrics represent some kind of useful impact metric. As far as we know, this is not the case for most of them. Some valuable impact attributions can be made to Mendeley reader counts. However, Mendeley reader counts are not included in the Altmetric Attention Score. Answer: We agree, there is no consensus on how to assess impact and various tools are available. Our decision to major on Altmetrics was based on the popularity and availability of the tool. We have text on this aspect to the manuscript. As stated on the Altmetric website: "Displayed on the details pages but not included in the Altmetric score are the number of Mendeley users who have saved the research to their library. You can view a breakdown of the demographics (location, discipline, etc) of these users on the summary tab details page". Mendeley reader counts are a separate impact metric to the Altmetric Attention Score. They are also stated on the Altmetric website, as opposed to be incorporated into the score itself.

Comment 4:
The Altmetric Attention Score has been taken from the publisher of the journal. I assume the publisher did not count the Altmetrics events. The Altmetric Attention Score is calculated by Altmetric.com. Thus, I assume that the original source of this score was Altmetric.com. Citations were taken from Google Scholar. There are some problems associated with citations from Google Scholar. Not much is known about the coverage of Google Scholar, but it is known that Google Scholar sometimes confuses blogs with journals and the coverage of Google Scholar is fluctuating. Why was no established bibliometric data source used? Considering the fact that the journal Stroke is a medical journal, it would have been interesting to analyze F1000Prime scores along with citations and Altmetrics. Answer: We agree that the scope and coverage varies between bibliographic databases. There is no gold standard resource. We have added text on this issue to the Discussion (limitations) section.

Comment 5:
In various instances, Altmetrics are described in a problematic manner, e.g., (i) in the abstract: "... Altmetrics, which quantify social media footprint." (Altmetrics capture more than just social media footprint) and (ii) page 3: "Alternative metrics or 'Altmetrics', are contemporary measures of impact ... ." This implies that citations are outdated measures of impact. I disagree.
Answer: We agree that Altmetrics capture more than social media activity and have amended the text accordingly. Our intention was not to dismiss traditional measures of impact. Altmetrics are a newer measure of impact than citations. As stated in our discussion/ conclusion, we believe citations are still a useful measure of impact and we do not suggest that citations are outdated. We hope that the revised text makes this clear.

Comment 6:
The sentence with reference 2 starts with "currently" whereas reference 2 is from 2009. Claims about the current status should not be referenced with papers that are more than ten years old. Answer: We have removed the term 'current'.

Comment 7:
In the next-to-last paragraph of the introduction, a single sentence is backedup with references 5 and 9-19. Are that many references necessary to back-up this sentence? I think the references should be discussed in a bit more detail in such cases. The sentence following references 5 and 9-19, however, is not backed-up by any reference: "However, there has been limited research describing which factors influence Altmetrics." If there has been some research regarding this, some of it should be cited. Answer: The references are referring to the "various features" which have been shown to influence a paper's citation count. These various features are shown in Table 1. They are also mentioned in the discussion as to how they affect a paper's citation count. Our included references also include some detail on Altmetrics and so, support our text.

Comment 8:
The second paragraph of the methods section states: "All aspects of searching, data extraction and analysis were performed by a single trained researcher (LW) with additional input for internal validation as described below." Somehow, this sentence implies universal training of LW. I am unsure if this sentence is properly located in the methods section as it partly sounds like an incomplete statement of contributions to the work. The following sentence states that "all data are in the public domain and collated data are included in this paper." I am unsure if Google Scholar citations an Altmetric Attention Score are data in the public domain. The authors also state in the data availability section: "All data underlying the results are available as part of the article and no additional source data was required." Answer: The researcher was trained in use of Google scholar and Altmetric resources. Google scholar and the Stroke website are freely accessible to the public. The study specific data are available on the Researchgate resource, we have made this clearer in the first sentence of the methods section.
Comment 9: This very short paragraph of the sub-section "validation" in the section "results" should be expanded. Answer: We have added some further description to the text.
Comment 10: I am confused by tables 2 and 3: The sample size has Q3 and maximum values higher than the number of analyzed papers. How were the samples constructed, or how was the sample size defined? Answer: Q3 and max referring to the sample sizes of the studies, not the number of studies. Sample size was stated in the paper itself.
Comment 11: This final two paragraphs state that 21% of the variance in citations and only 3% of the variance in Altmetric Attention Score are explained. I doubt that the results based on that are useful. Answer: This is an important point and also raised by the first peer reviewer. As per our response to Comment 3, we have added more text on this aspect to the Discussion.

Comment 12:
The first paragraph of the discussion section states: "There is no agreement on what constitutes a 'high' citation count and measures will vary according to year of publication and topic. Altmetric Scores for Stroke papers ranged to greater than 500; however, there is no consensus value for a high Altmetric score, although previous data suggests that a score of 20 or more is relatively high 21." Usually, the top-X% (X is often chosen as 10) are used as a threshold for a high citation count for publication years and fields, sometimes also document types. Also, the Altmetric Attention Score depends on the field, possibly also on time. Actually, the article detail page of Altmetric.com provides a placement within the top-X%, see for example: https://www.Altmetric.com/details/13596962 Answer: We felt that the placement within x% of all Altmetric scores (seen in top left hand corner of linked article) wouldn't be a fair comparison as this would include papers published in multiple different journals/ different areas of research. We opted to mention an absolute threshold in the Discussion, but this is included only to give context to the results.
Comment 13: I struggle to understand Figure 2: the y axis has the label "measure of impact" and the y axis contains negative values. The calculation of this "measure of impact" should be explained better. Answer: The different colours refer to citations/ Altmetric score (the measure of impact). The graph is comparing different categories of paper (listed in Table 1) to Population studies (chosen as the comparator as this yielded the most statistically significant results). Graph demonstrates how other categories of paper compare to population studies in terms of citations/Altmetrics. Negative values therefore mean that those categories on average get X less citation/Altmetric scores than population studies, whilst positive values indicate these categories get more citations/ Altmetric scores. We have added text to help the reader understand the data visualization.

Comment 14:
The first paragraph on page 10 below Figure 3 states: "Certain types of research seemed to generate reduced impact, in general pre-clinical research papers had both lower Altmetrics and fewer citations. One could argue that these results reflect the predominant clinical focus and readership of the journals rather than a lesser impact of preclinical research." In this context, the paper "Citation Analysis May Severely Underestimate the Impact of Clinical Research as Compared to Basic Research" by N. J. van Eck et al. (DOI 10.1371/journal.pone.00623951) might be of interest. Answer: Thank you for highlighting this paper. We agree it soi very relevant and we have included it in the revised manuscript.

Comment 15:
The methodology behind Table 6 and Figure 3 should be explained in much more detail. What are "Population Studies" in this case? Why was this odds ratio analysis performed only for the Altmetrics Attention Score and not for citation counts? Answer: We have added more detail on the analyses and their interpretation (Methods and Results sections) Comment 16: The last paragraph of the discussion debates the proper handling of missing values of the Altmetric Attention Score. The proper handling should be a score of zero. It is very unlikely that Altmetric.com missed a mention of a paper in one of their main data sources such as Twitter. One can speculate about many other reasons. Those would in a similar way also apply to citations, e.g., citations are missing because of incomplete citation linking, and we still use citation counts of zero. Answer: We apologize if the wording was unclear, where Altmeric scores were labelled as missing there were no data available for the paper, not a score of zero. Given the lack of data for these papers, they were excluded from analysis. We suspect this action did not bias the results as there were only 22 such papers. We have reworded to make this clearer.