以性別自然語言處理觀點分析與預測同志語言

在現今中文性別自然語言處理的研究脈絡下，大多數研究僅專注於生理性別的討論，對於性別文本的自動分類，更僅建立於一般異性戀男女的文本上。然而，從人文科學的角度出發，性別本身的複雜度亦會影響語言的表現。對此，本論文為中文性別自然語言處理領域中，少數由性取向的觀點出發，討論性別文本分類的研究。首先，為證明性取向亦為有效分類性別文本的參考指標，本論文從中文PTT收集了同性戀男性、異性戀男性、同性戀女性與異性戀女性的性別文本，並利用卷積神經網路模型輔以Word2Vec詞向量訊息，以及支持向量機器搭配語言學特徵組，個別訓練分類器來偵測中文男性文本與女性文本中所蘊含的性取向訊息。機器訓練結果顯示，無論是使用卷積神經網路模型或是支持向量機器，訓練的分類器皆能在隨機機率(準確率0.5)的標準下，成功分類同性戀與異性戀的文本。其次，有別於過去研究僅專注於分類異性戀男女文本，本論文另利用了與上述相同的機器學習模型與文本特徵，來訓練男女同性戀文本的分類器。除此之外，本論文另收集了中文同性戀論壇的性別文本來測試訓練好的分類器，以證明本分類器不僅能夠成功預測PTT的同性戀文本，亦能夠適應來自於其他網路來源：UThome以及2Girl的同性戀文本訊息。在有限的時間與計算資源下，本論文的訓練結果顯示，在判斷男女同性戀文本的成效上，支持向量機器優於卷積神經網路模型。另外，在男女同性戀文本的語言學分析下，本論文亦觀察到不同性別文本除了在實詞的使用上會有所不同之外，在虛詞、標點符號、句法架構、甚至是統計數據，例如詞彙豐富度、字元數量、詞組數量、資訊可預測性等的量測上，也有顯著的統計差異。

關鍵字

性別自然語言處理；薰衣草語言學；同性戀文本；卷積神經網路；支持向量機器

並列摘要

In the present days, research under the issue of gender and natural language processing (GenderNLP) usually target at gender-norm language that spoken by biologically males and females. However, from the standpoint of humanistic science, language is a subject to many influences like gender complexity. For this reason, the current thesis aims at exploring gendered texts from the perspective of sexual orientation in Chinese GenderNLP domain. Firstly, in order to prove that gendered texts can be well-categorized not only by biological sex, but also sexual orientation, this thesis adopts both Convolutional Neural Networks (CNNs) and Support Vector Machine (SVM) and uses both Word2Vec embeddings and linguistic feature set as input vectors to train classifiers that are able to correctly categorize texts written by homosexual males, heterosexual males, homosexual females, and heterosexual females. By simply using the threshold of 0.5 in this pilot experiment, training results show that either using CNNs or SVM, our trained classifiers are able to classify homosexual texts from heterosexual texts collect from Chinese social media PTT. Secondly, with the adoption of identical model settings as in our pilot experiment, the current thesis trains another homosexual classifier in order to automatically identify homosexual males and females’ texts. In addition, since this study expects our trained classifier does not only limits to homosexual texts records from one single source, but could also correctly classify gendered data from different textual environments, homosexual texts from two different online sources: UThome and 2Girl are also collected and used. Under the experimental limitation of time and computing resource, results of our experiment show that in such homosexual classification task, the SVM model is likely to outperform the CNNs model. Furthermore, under the linguistic analysis of homosexual texts, it is also found that gendered texts do not only differ in the use of content word, but linguistic features such as function word, punctuation, syntactic structure and statistical measurements such as lexical diversity, word count, character count, and information unpredictability also show significant statistical differences in our homosexual classification tasks.

並列關鍵字

GenderNLP ； Lavender Linguistic ； homosexual texts ； convolutional neural networks ； support vector machine

參考文獻

Abe, J. A. A. (2009). Words that Predict Outstanding Performance. Journal of Research in Personality, 43(3), 528–531. https://doi.org/10.1016/j.jrp.2009.01.010

Google Scholar

Acker, J. (1992). From Sex Roles to Gendered Institutions. Contemporary Sociology, 21(5), 565–569. https://doi.org/10.2307/2075528

Google Scholar

Agnew, C. R., Van Lange, P. A. M., Rusbult, C. E., & Langston, C. A. (1998). Cognitive Interdependence: Commitment and the Mental Representation of Close Relationships. Journal of Personality and Social Psychology, 74(4), 939–954. https://doi.org/10.1037/0022-3514.74.4.939

Google Scholar

Ahmed, S. F., Morrison, S., & Hughes, I. A. (2004). Intersex and Gender Assignment; The Third Way? Archives of Disease in Childhood, 89(9), 847–850. https://doi.org/10.1136/adc.2003.035899

Google Scholar

Ajala, T. (2016). Social Construction of Gender Roles and Women’s poverty in African Societies : The Case of the Nigerian Woman. International Journal of Gender and Women’s Studies, 4(2), 1–10. https://doi.org/10.15640/ijgws.v4n2p1

Google Scholar

被引用紀錄

謝舒凱、曾昱翔（2019）。深度詞庫：邁向知識導向的人工智慧基礎。中華心理學刊，61(3)，231-247。https://doi.org/10.6129/CJP.201909_61(3).0004

國際替代計量

以性別自然語言處理觀點分析與預測同志語言

全文下載

主題瀏覽