Starting a blog for some reason.

5 minute read

Published:

Why?

I started my PhD program in the Fall of 2023. At this time, I wasn’t sure whether or not I was even qualified to pursue this dream of mine. Only a few weeks prior, I was slogging my way through my final undergraduate course (linear algebra), while failing to resist all the distractions present in my life. In the beginning of my program, I was constantly concerned that I would immediately fail out for one reason or another. Luckily, that did not happen. My lifelong addiction to puzzle-solving fueled me through hundreds of hours of subpar programming, some of which was yielded meaningful, publishable results. The ‘publishable’ part of that would certainly not have been possible without my advisor, Dr. Rickard Ewetz, who against all odds was able to teach me how to write research papers that are both scientifically interesting and academically stylish (at least, in the opinion of some reviewers).

At this point in my quest for that sweet, sweet piece of paper, I have made many mistakes and learned many lessons. Unfortunately, I am extremely forgetful, and so I need a place to store my thoughts, intentions, results, progress, and whatever else proves useful or interesting enough to document. I suspect most people who have this problem keep a journal, which is a perfectly valid solution. Personally, I need some sort of accountability system to ensure I actually maintain the habit of documenting my thoughts. My solution to this conundrum is to keep a public journal in the form of a blog. The idea here is that I will be motivated to continue to update this blog to avoid the embarassment of failing to do so. Let’s hope it works.

PhD Week 66

Now that I’ve finished coping to an imaginary audience, let’s talk about where I’m at in my research. At this time, there are two papers in the publications section of this site. I completed writing both of these between Summer/Fall of 2024. They are both related to applying language models to solving different types of graph navigation problems. The shortest possible summary I can give of the knowledge presented in these papers is:

  • Language models can solve graph problems more effectively by writing code than producing solutions directly.
  • Language models can accurately interpet natural language descriptions of spaces into some type of graph.
  • Informative feedback loops between language models and the interpreter/compiler/solver of the target formal language is an important component to frameworks that use such models for program synthesis.

While these results are meaningful in the field, this is not the area that I’m most interested in researching. I have spent the interval between writing these papers and the time of writing, I have worked primarily on two tasks. Firstly, I have spent many hours creating a performant solution to the DARPA ANSR project, the details of which can not yet be shared, but will likely be published in some form in the coming months. Secondly, I have authored a third paper that will be submitted to the fast-approaching IJCAI’25. Details about this paper will be available when the first version is up on arxiv, but for know, here’s an overview.

Lemme tell you about the paper I’m writing

The paper is motivated by the general desire to be able to translate natural language into a formal verification specification language, such as LTL or some other formulation of temporal logic. I’m sorry, but you’re just going to have to accept that this general desire exists. I had to do this and now so do you. By and large, the current approaches are different version of “Parse the input real good and give it to a really solid language model.” By “real solid”, I mean “produces accurate translations”. Many off-the-shelf models are capable of producing a passable attempt at translation, zero-shot. However, the unfortunate reality is that this level of performance is the result of oversimplified benchmarks. Additionally, players of this game are comfortable with the liberal application of resource-intensive models simply because they produce good results with little additional engineering effort. Some researchers have wisely begun to question this approach, and have returned to smaller models to explore methods of raising their translation accuracy to match that of models with expert reasoning abilities. These are what I would call true translation models, which differ from causal models such as GPT and Claude in the sense that they do not share a training objective; the former being the direct sequence-to-sequence translation of NL into TL, while the latter uses causal reasoning resembeling human intelligence to derive a translation from the NL input. The difference here is not subtle. While the reasoning abilities exhibited by causal models are impressive, they are in many cases prohibitively expensive to deploy. My latest paper outlines a framework for NL-TL translation using a combination of AP masking with a masked language model, and translation with a sequence-to-sequence model that has been trained with a neuro-symbolic loss function (which is the primary contribution of this paper). We were able to show very simply that this loss function improves training convergence in terms of # of steps to convergence and the minimum loss achievable during training. Once again, more details of this approach will be available when I put the preprint on arxiv.