What Big Data and Big Models Bring to the Table
Cassandra Roundtable 5
University of California, Berkeley. What big data and big models bring to the table.
February 24, 2023, 11am PST.
- Kawin Ethayarajh, Stanford
- Kristina Gligorić, Stanford
- Lucy Li, Berkeley
- Austin van Loon, Stanford
- Arya D. McCarthy
- Giovanna Maria Dora Dore
I think one aspect of large models that gets the attention is the ability to generate and extract useful features. Another interesting feature is the ability to ask questions at a much larger scale.
Big models have a lot of blind spots, especially the latest generation of language models. They are trained from corpora from the web, which is disproportionally in English compared to all that content that is out there in the world. It also tends to come from certain subsets of population who are writing those contents, whose opinions and views are then overrepresented. As we move forward, we need to think about the data that go into these models and whose opinions are reflected.
I want to focus on big data because big models cannot be really separated from the large behavioral dataset they are trained on. I think the benefits that big data, and behavioral data, bring to the table are a bit taken for granted, particularly for social scientists (…) If we think about before, behavioral data was very scarce, costly, complicated to even collect. But now with digitalization it has changed.
We talked about big models helping test theories in social science. It is also important to think about the possibility of big models helping develop new theories in social science or in general.
Big models bring to the table better accuracy, better F1 scores, and better performance for a bunch of different tasks.
Using different models to validate the analysis see where the models disagree could be very generative.
Austin van Loon
Meta-theory is not set up well to deal with the machine learning. So, there is aversion from the social scientists because they think that if you are doing prediction, you are not doing explanation. I actually do not think that this is right.
Social scientists need to update how they test theories, think about testing theories.