Summary
- I provide some info about my path from astrophysics postdoc to Anthropic Individual Contributor (IC). This includes some narrative, some details of what I focused on for upskilling, and some retrospective advice for how other people can make a similar change.
- Broadly, I recommend the following:
- Build your coding skills. Know your algorithms, and learn how to contribute code to someone else’s repository.
- Practice interviewing; it’s a separate skill you need to be good at.
- Understand the basics of machine learning / deep learning if you want to go into AI. Feel comfortable with and like you have some intuition for the devloop there.
Background
Back in 2021 (post pandemic, but we were finally getting to work with people in person again!) I was at a KITP program. I was meeting or reconnecting with some people who would go on to be very close collaborators and friends. Adam Jermyn was still doing astrophysics research at the time, and we spent a lot of time together at this workshop. At the time, I was convinced that Adam would be a shoe-in for pretty much any astrophysics professor position he wanted. But – that wasn’t his plan. He was fed up with the academic workload, the low pay, the…well, lots of things. And he thought there were more important problems that he could be working on. He decided to leave the field (which to an academic is basically the same thing as dying), moving on to independent AI safety research in mid-2022 and eventually joined Anthropic’s interpretability team before anyone really knew what Anthropic was. We kept in touch.
Fast-forward to the summer of 2023. I was on a track to a really good career in astrophysics research (probably on track to land a professorship!), but I was struggling with many aspects of academic life. Most importantly, I found it hard to explain why my work mattered to anyone who wasn’t an astrophysics or fluid dynamics researcher (note: I’m talking about being honest and real, not “I’m writing a grant so my work is important”-speak – which was another thing that weighed on me). The world (me included) had its “oh, huh” ChatGPT moment in late 2022 and AI was increasingly in the public sphere. I asked Adam for some resources on AI safety and interpretability, and started reading (these references were: On why AI safety is important broadly [1], a few blog posts by Holden Karnofsky [2,3,4,5], the former CEO of OpenPhilanthropy, and Chris Olah’s 2023 Interpretability Dreams). I realized that this field was really interesting and also really young. That felt refreshing compared to my previous work of finding a niche within a niche that no one had figured out in the past ~century and becoming the person who does that™. I also found that, to me, AI work felt important. AI will almost definitely be world-transforming technology, and I wanted to be part of a team that was trying to make that transformation go well.
With all this running through my head, I started an astrophysics postdoc position at KITP in the fall of 2023. I knew I wasn’t interested in pursuing the academic track any longer, and I wanted to pivot towards AI / AI safety work. I told my advisor (Lars Bildsten) this, and he responded that the goal of my time at KITP should be to prepare myself for the career I wanted and to help get me there (Lars is great, and fortunately I was on a cushy fellowship). I flailed a little bit at trying more astro work to see if I could stick with it, but I just really didn’t have the passion for it anymore, and pretty quickly decided that what I wanted to do was start a career in AI safety research. This blog post is a description of how I spent that year leading up to starting a job on the Anthropic pretraining team (not safety work – but capabilities work! Not something I would be comfortable doing at other frontier labs.) in September 2024.
I had a lot to learn
I started diving into AI interpretability papers and code in ~August 2023 (mostly the distill circuits articles and what was then published in the transformers circuits thread). I talked with Adam again in ~September 2023 (right as I started my new postdoc), decided to take the leap and apply at Anthropic, and I submitted an application a few days later. I reached the final stage of interviews (the “loop” or “onsite”), but I didn’t receive an offer, and was told to come back again in 6 months after “upskilling” some more. Note: you don’t receive direct feedback when you don’t receive an offer, so it wasn’t entirely clear what I needed to work on. After some reflection on what I felt went wrong, I decided to work on the stuff that I lay out below. When I came back half a year later, I got an offer! This is not a particularly rare story at Anthropic (trying a couple of times before receiving an offer), and I think people in academia are trained to think they only have one shot at that dream job (that’s how it is if you want to, say, be a professor at Berkeley). I don’t know if this is true at all companies, but my experience at Anthropic broke me out of that academic training. My new belief is that you can pretty much try every 6 months if you can show you’ve been improving and learning, assuming these are true:
- You know someone inside the company – they think you’re good and are willing to recommend you (this gets you past having resume auto-filtered)
- You don’t completely bomb your interviews. That is, you seem like a good person to work with even if you don’t display technical competency on the interview questions for whatever reason(s).
So – my advice: don’t spend 6 months upskilling to apply – spend a couple of months upskilling and then apply, and if it doesn’t work out, then come back again in a few months and try again! Two shots in 8 months is probably better than two in 12 months, y’know?
Anyways, all that said, below is my interpretation of where I fell short on my first round of interviews and what I worked on to make sure the second round went better.
What went wrong
I think these are basically the things that went wrong in my first round of interviews:
- I needed to beef up my coding skills. The code that is used in academia (outside of computer science presumably) is…not great. The lowest hanging fruit here is just that I needed to learn how to use data structures other than lists and arrays, which were what I’d been using ~exclusively since starting grad school .
- My probability and statistics were stale, and these are good to know to help you understand things like how to initialize weights in a neural network, or how to normalize activations in one.
- I wasn’t familiar with the machine learning devloop. I didn’t know what to look for when training models. I knew “loss go down = good” but aside from that I didn’t have any intuition here for how things should look, or how to read the tea leaves in a loss curve and understand what was happening or going wrong.
- I was completely new to technical interviews. Technical interviews at tech companies have a skillset that goes with them, and you need to polish up on that skill before applying somewhere to feel comfortable in the environment.
What I learned
I think at the time I was convinced that my biggest weakness was a lack of knowledge of the ML/AI field. I spent a lot of my time upskilling in those areas, and a bit of time also beefing up my coding skills. If I were to do it again, I’d probably have done closer to a 50-50 split, but I discuss that more in the next section. Anyways, here’s what I spent the next six months doing with some commentary:
- Building ML skills: I was roughly starting from scratch here, because I had not ever done any ML during my time in astrophysics. I tried to build these skills from the ground up:
- Intro classes: I was at UCSB and a postdoc, so it was really easy to email professors and ask to audit (sit in the back of) their classes. I audited an intro Machine Learning course (and did the homework) and also sat in on an intro Artificial Intelligence course. If I had more time I would’ve liked to sit in on some kind of Deep Learning course, but it didn’t work out that way. So I did the next bullet for that.
- Online courses: I took some online courses to give me some breadth and some insight into newer ML techniques (deep learning):
- Fast.ai’s practical machine learning for coders course – this is a nice one if you want to play around with public AI models quickly and get a feel for the types of things modern AI systems can do.
- MIT’s intro to deep learning course – Gives a high level overview of some of the most important modern architectures.
- Stanford’s Natural Language Processing (NLP) course – LLMs are the current state-of-the-art in NLP. I didn’t finish this course, but it gave me some understanding of the history and depth of the field.
- ARENA notebooks: I was interested in interpretability specifically. I worked through a bunch of the ARENA notebooks on interpretability, which let me get my hands dirty with “how do we make claims about what’s happening in a model” (and how to use pytorch, and also how transformers are structured). If nothing else, anyone who wants to work with LLMs should do the first chapter where you build gpt-2 from scratch.
- Small research projects: I did some small research projects and wrote blog posts and lesswrong posts to get some exposure and practice working in the field (all pretty much done on cheap colab or kaggle GPU resources). See https://evanhanders.blog/
- MATS: I did a project in summer 2024 through MATS – I’m conflicted about whether this was worth it. I think it was good exposure to the way that people in AI research and safety research think, and MATS is a known commodity in the community that I could point to on my applications. But – it’s a 12 week program, geared towards people who have never done research, and I definitely felt out of place in that community as a PhD/postdoc with 10 years of research experience.
- Brush up on stats: I read parts of All of Statistics to remind myself how that field works.
- Read papers: I read all of the Anthropic transformer circuits thread and some of the papers referenced therein
- Neel Nanda’s ‘getting started in interp’: I took much of Neel’s advice here, but you probably want to look at the updated version linked in that article for more up-to-date tips.
- I went to EA Global to meet other AI safety researchers. This was probably worth the weekend I spent doing it. EA global is not very talk heavy and is very “have one-on-one meetings” heavy, so I learned about job openings, met a few collaborators, etc. If you get the app and reach out to people and set up meetings, your time will be well spent.
- Building Coding Skills:
- Leetcode – Sure, Anthropic interviews aren’t leetcode interviews. But I needed to refresh in my mind all of the fundamentals of computer science, data structures, and algorithms. Leetcode was a great place for this. I’d recommend the interview top 150. Work through a few different problems in each of the sections (array/string, hashmaps, queues, etc…) so that you’re comfortable with those algorithms and data structures. Do it blind, then read the recommended solution in detail so you understand how to do it better. Then do it again in the ‘right’ way to build the muscle memory and get a feel for writing the best solution. Practice big-O notation on time and memory complexity.
- Leetcode online tests – once I felt good about the above, I tried out some ‘fake’ interviews: https://leetcode.com/assessment/ – you can pick 2, 3, or 4 problems (with a timer for 1, 1.5, or 2 hours). This was a very low-stakes way to see if I ‘got it’ (did I know how to pull out a data structure or algorithm when I needed to?). I think these questions are randomly drawn (from a company-specific pool) and helped me identify gaps where I needed to spend more time. Also this was a super safe way to get used to the pressure of thinking, coming up with a solution, and coding on a clock.
- Practice interviewing – I mentioned that I needed to develop the ‘skill’ of doing a technical interview, and I tried to do that. In a coding interview, you receive a problem, talk through a solution and your thought process, then implement your solution, with live debugging along the way. Sometimes you then get a second part and extend that solution to do something more interesting depending on how time goes. When I practiced, I used https://www.hellointerview.com/ to do this, but they’ve changed their website a lot since (they had free AI-driven mock interviews back when I used it). Modern AI models are much better than they were a couple years ago, so instead of relying on a website with the scaffolding, just go get a Claude subscription for a month or two, ask Claude to roleplay an interviewer, and do some practice with the interview loop with Claude sitting in as the interviewer. You can write your solution in an IDE and if you get to a point where you think you’d need to ask the interviewer for clarification you can copy+paste your solution and ask Claude. Give it a shot!
- Start using AI coding tools – when I was upskilling, Github Copilot was the main coding tool I learned to use and I was pretty sure it was the best thing ever. Claude Code is much better. Honestly, even just chatting to Claude about “what’s the right way to tackle this coding problem” is a really powerful tool. If you’re not using Claude (or some other production LLMs) in your coding, you’re missing out and falling behind with your coding skills. Note that you’ll have to “stand on your own” in interviews without AI assistance, but if you don’t understand something or need to learn how to do something, Claude’s a great teacher, and in your eventual tech job you will be using AI.
What I would spend more time on if I did it again
I learned a lot working through the list of things mentioned above. That said, I think I’d spend more time on the coding skills and a little less time on the ML stuff if I were to upskill again. Learning the basics is good, but I probably spent too much time on ML courses learning about the history of AI, etc, and I tried to do too many small research projects. I had already proven myself a competent researcher in astrophysics, and my resume showed that, but I just didn’t have the coding chops to do regular engineering work someplace like Anthropic. That being said, here’s probably what I would do differently:
- Building Coding Skills:
- Practice the coding devloop and code review: You’ll be doing this every day in ML, and you should be comfortable with it. The best way to build comfortability is to be involved in a project where you’re bugfixing other peoples’ code! See below, and hat-tip to Zac Hatfield-Dodds for this advice:
- How do I find a project to contribute code to? Go to https://numfocus.org/ – it has all your favorite python packages, and they’re all open source. They also have people who maintain the repositories, so you’ll get feedback in reasonable times. They want your free labor, so don’t be scared.
- What do I do once I choose a project? Go to the github page for the project and also look for the instructions on how to contribute. Install the environment for contributing, which should have things like CI built in. You’ll want to get comfortable using those. Once you have things installed, go to the github page, go to the issues, then look for an issue that’s marked with something like “good first PR” or “good first contribution” or something similar. These are usually small refactors. The point here is to get your environment working and fix something small.
- If you don’t feel confident about this – ask Claude for help on how to make the change and talk it through! And – this whole exercise is partially to help you build some thicker skin. Code has bugs. It needs to be reviewed. You need to build this skill.
- Once you’ve made your first contribution – congrats! You have your environment set up and know the guidelines for contributing to that package. That’s a big step. Next, look for something a bit harder, and try to make a bigger change than a “good first one”!
- Know your data structures – I had a minor in CS in undergrad, so some of the leetcode stuff was just brushing up for me. If you never took CS courses, I’d recommend auditing a (~sophomore level usually) data structures and algorithms course.
- Practice the coding devloop and code review: You’ll be doing this every day in ML, and you should be comfortable with it. The best way to build comfortability is to be involved in a project where you’re bugfixing other peoples’ code! See below, and hat-tip to Zac Hatfield-Dodds for this advice:
- Building ML skills:
- I wouldn’t have cut out too much in the way of courses: I think the basics here (intro ML course, intro deep learning course, coding up and training those things [doing homeworks]) were really useful for me as a total novice. Brushing up on stats would have helped me with the homework for these courses.
- In reflection, the best thing here was watching a professor reason through math to learn how people in this field think and approach problems. Highly recommend going to lectures and taking written notes and working through math with professors.
- I would have not done the basic ‘artificial intelligence’ course – the only really useful thing there was the reinforcement learning section.
- I tried to do too much independent research: I would have done fewer independent research blog posts and replaced those with some blog posts about other skills I’d been practicing (like recreating training an autoencoder or learning about how some new data structure worked and applying it).
- ARENA was very useful: I should have worked through more of those notebooks sooner, and I should have read the supplemental papers with those notebooks instead of just getting the code working.
- I wouldn’t have cut out too much in the way of courses: I think the basics here (intro ML course, intro deep learning course, coding up and training those things [doing homeworks]) were really useful for me as a total novice. Brushing up on stats would have helped me with the homework for these courses.
On Rebranding
If you’re reading this, have a PhD, and are thinking about changing careers, I have some good news for you: you have a lot of transferable skills! You just need to learn how to communicate your skills and expertise to a new group of people who value totally different things from your academic community.
My academic background was in designing, running, and analyzing big astrophysical fluid dynamical computer simulations. I was good at what I did, and probably would’ve described the skills and things that I did day-to-day with language like this: running simulations at state-of-the-art resolutions, using python packages like numpy and scipy, data visualization, writing and reviewing papers and talks, and so on. Fortunately, some of those skills can just be directly communicated to my new colleagues without reframing (numpy, scipy, data visualization). However, “state-of-the-art resolutions” doesn’t really mean anything to people in ML, but saying that I have experience with high performance computing and saying that my simulations used hundreds of Eflops helped bridge the gap of what I had done to what people were doing in my new field. Writing papers and designing talks have given you experience in technical writing / presentation. Working with collaborators requires communication. Academic mentorship requires both leadership and project management.
Finding the right words to express your skills can be hard, and I would honestly recommend putting your CV into Claude, saying what field you want to switch to, and asking Claude to help you reframe your skills and experiences in a way that will be meaningful to the people in the field you’re going into (probably AI if you’re reading this!) And then when you’re done, send your CV to a friend in that field to see if they know what you’re trying to convey and to make sure Claude didn’t hallucinate anything (this happens less and less frequently these days, though).
Conclusion
Changing careers is really scary. In our society, we derive a lot of our self-worth from our jobs. When you’re deep in an academic career, you’ve probably spent years attaching your self-worth to your job. Leaving that job behind to start something new can be paralyzing.
For me, changing fields from astrophysics to AI and landing a job at Anthropic was one of the best choices I’ve ever made. I really enjoy working with my coworkers (who are collaborators instead of competitors for the same job pool). I don’t have the pressure to publish for the sake of…publishing (if a project doesn’t pan out, I can write a quick doc on results and drop it; if something does pan out, I can write a quick doc and help integrate it into upstream things). I have my dream job, and I don’t have to compromise on it (I live where I want, I don’t have to go through the slog of the postdoc track to get there, and I don’t have to dedicate myself to this job for literally the rest of my life). Also – I don’t have to do 3 jobs! Leading a research group, teaching, and being an administrator are all full time jobs and professors are asked to do all three. I’m an individual contributor (IC), so I do research and communicate the results to my colleagues. I may want to be a manager (some admin, some leading a research group) because I love mentorship, but that’s a choice that’s up to me and I can make it when and if I want.
In summary, you can definitely develop the skills you need to move away from academia. It’s just hard to know where to put your time and what to focus on. I successfully switched careers using the roadmap above, and hopefully my experience helps you land your dream job a bit more efficiently than I did!
Leave a comment