Lily Bhattacharjee

“One thing about doing a project on bias is that you start to notice your own…”

Interviewed by Justin Le on March 28, 2019

What's your year, major, semester in SAAS, and where are you from?

I'm a second year EECS major from Fremont (basically 40 min away from Berkeley) and this is my first semester in SAAS.

That's kinda close by. So how's the semester been going for you?

It's been pretty busy — I'm also in DSS and an undergraduate theory CS group, which is actually the most involved in extracurriculars I've been at Cal so far. I love my classes though, especially Stat 153, and I feel that a lot of the material I've been learning in that class and others are relevant to side projects I'm working on.

What extracurriculars were you involved in before this semester then?

I've been kind of switching around, experimenting. When I first showed up at Cal, I thought I was going to be either EE-focused or at least evenly distribute my classes between EE and CS, so my first semester, I was in IEEE (Institute of Electrical and Electronics Engineers). I ended up finding CS more interesting after a few semesters, so I briefly joined a nonprofit to create DS / CS curriculum for women in prison and helped a research-focused club develop introductory CS curriculum as well. I ended up getting more into DS / stats this and last semester (when I joined DSS) and took Prob 140. That class was amazing, and even though I'm not sure what I want to do with my degree yet, it would be great if it involved data analysis in some way.

Wow that's a lot of stuff. Is the later part how you got interested in SAAS?

Yeah, I was looking for new clubs to apply to specifically in the realm of statistics this semester and I'd heard of SAAS before. Luckily, I ended up getting in.

Cool, so onto R&P. What have you been working on?

My project is centered around bias detection in US political news articles. My stretch goal is to create a web extension that highlights words / phrases with partisan connotations on a news page, but because this would be pretty challenging to code, I'm starting out by attempting to quantify left / right leanings for common news sources on controversial topics like immigration and abortion. Recently, I was looking into articles published by NYT in the last 2 months concerning abortion, and it was pretty interesting to discover that not do the majority of articles (> 80%) contain structurally and linguistically similar sentences despite being written by different reporters, but the similarity has only increased over time.

So you're looking at word usage in articles; how are you going from there into bias detection?

I'm focusing on style-based bias detection, although this isn't an awesome way of distinguishing biased reporting (~50% accuracy, [Source]). To this end, I'm planning on using the Watson Tone Analyzer API to give me a better idea of the mood of each piece (categorizes from a list of several emotions) to better judge the sentiment of an article towards a particular issue and associate that with its closest political leaning. I'm planning on implementing a fact checking aspect to my final product as well; previous projects that have worked on similar problems have done that by checking individual sentences against Politifact.com.

How do you envision your final result or report to look like?

I started this project because I wanted to know how this image was put together — if it was a crowdsourced judgment, based on the biases of the writers of the articles I found this in, or actually quantifiable. I would like to create and justify an image like this for every hot button issue I explore for my project.

Where did you learn about the techniques and analysis methods you mentioned above?

The first few weeks of R&P were more research-focused so we could gain a better understanding of what progress researchers had already made on our problems and how they'd approached them. What I mentioned is a somewhat tweaked implementation of what researchers have already published in papers, because a few of them had access to corpuses that haven't been made public and I'm trying to replace them with APIs that have already been trained for sentiment analysis. For the article and summary text specifically, I'm following a methodology published here — considering positive / negative / neutral classifications as well as parts of speech and domain-specific vocabulary (other papers ended up making their own left / right leaning keywords lists because there isn't an easily accessible public database: [Interviewee's Source]).

I'd be lying if I said I understood all that. But it sounds like you have everything under control. I was more wondering where you learned all of this.

Oh lol sorry for not explaining that well. We had a research phase where we read a bunch of papers on our respective topics. That's where I heard about these types of analyses… does that answer the question? I've also been seeking out help from the R&P leaders when necessary; this is my first contact with NLP so their comments have been helping my refine my workflow a lot.

Did you have any difficulties or struggles with the project?

I feel like I've faced difficulties understanding how to use spacy, nltk, and associated display libraries to do exactly what I intend them to, as well as interpreting any trends I see. One thing about doing a project on bias is that you start to notice your own… which is why I'm trying to cover as many "controversial topics" in current politics as I can, because even picking the ones that seem important and selecting the news search keywords introduces bias. Also, right and left are really fuzzy terms so I've encountered some trouble trying to justify my the definitions I'm using for this project, and I'll definitely be clarifying exactly what is generally considered to be in each category in my final report.

That makes sense. Did you have anything else you wanted to say regarding the project?

No, not really.

This is your first semester in SAAS – what do you think of the club?

It's really friendly and open. I didn't have any friends in the club when I first joined, but I've gotten to know a few members through the donut channel, which I think is a great idea. The points system in particular is slightly stressful mainly because I find it difficult to make many of the events, but I understand why it exists. Committee meetings are really fun though; they're something I look forward to every week.

That's fair. You said that you didn't have any friends when you first joined which is understandable. But how did you make friends with people in the club, like how did you meet them?

It's kind of funny but the first person I met through donut is going to be my Stat 153 final project partner… when we met up we didn't even know we were in the same class but realized that when we started going in the same direction to lecture afterward. Other people I've made friends with are mostly in my committee, and we bond over the weekly assignments due at 11:59 Friday night (which always means part of my Friday is reserved for SAAS) and the fact that the conference rooms booked for our meetings always have < 5 chairs for some reason. I also realized that one of my fellow DSS committee members is also in SAAS so I think I've mostly been making friends by finding overlaps for the people I meet here in other parts of my life… it's hard to keep in contact with donut pairs / triples I only meet for an hour otherwise.

That's it for SAAS general questions unless you had something else you wanted to mention.

No, I'm good.

Interviewer's note: I gave a head's up and got permission to ask the following questions beforehand

As someone majoring in a math/stats/CS related field, have you ever felt discriminated or slighted against or otherwise negative experience that you felt was due to gender?

I've been lucky enough to not experience any form of overt discrimination from staff or students.

Do you have any thoughts regarding the culture in these fields with relation to gender?

Some parts of it are pretty toxic, but I try my best to stay away from them. One of the reasons I ended up leaving [club name redacted] was because I was one of the only women in the club and the only woman in my committee, and not only did I feel lonely, I sometimes felt like I was being held to higher standards than others in my committee. I don't know how much of my perception was real and how much was all in my head, which is why I hesitate to term it as discrimination. But I have to say that especially in CS (although it seems to be getting better), there will occasionally be negative experiences (for me, usually in labs or hackathons) where I come out thinking I don't belong. I find a lot of solidarity in other underrepresented STEM students who've felt the same way. I was in the WiSE theme program my freshman year and still maintain some of the friendships I made there.

Is there anything you would like to say to anyone who might be reading this, regarding pretty much anything?

I guess I'll give some class / finding your true passion advice haha — if you ever find yourself getting bored of a class's material, especially if it's in your major, push yourself to set aside some time one evening and build / read something tangentially related on your own. This is helpful not only for actually understanding theoretical concepts from a practical perspective but also for giving you a reason not to space out during the next lecture because you just made what the professor is talking about.

The website version of this interview was mildly edited for length and clarity by Abhinav Bhaskar.

Executive / Directors

Member Profiles

Internal Affairs

Lily Bhattacharjee