Sunday, May 11, 2008

By Varun

Since quite a few individuals have asked me questions on programming (and related sub-topics), I thought it would be a good idea to write a small primer on the subject, namely:
  1. Why Do I Need Programming? (skippable, if you know why)
  2. What Are My Goals?
  3. Which Language Do I Choose?
  4. How Do I Go About Learning?
  5. Final Thoughts

Why Do I Need Programming?

QuantFS prides itself on the quality of its members, and no doubt that every one of us knows our "theory", be it Econ, Finance, Math, or CompSci; however, if someone told you to implement it, where would you start? Well, maybe that's not fair...If you wanted to implement a regression model on CPI, M3, and Unemployment, that can easily be done in Excel, right? Well, what if you wanted a running regression on prices? Are you really going to sit there and re-run it all the time? No (aside from the point, that no one would pay you to click 'update').

So where does this lead? Programming is the glue to all the theory we learn as aspiring quants. Want to go into pure theory without programming? You'd better be that 4.1 Math/Physics kid from Harvard, otherwise learn some skills. Is programming a cop-out to pure theory? No, on the contrary, it only strengthens certain theories we learn about (CAPM, Optimization, all CompSci theory, etc...). Why should you program if it can be outsourced? The answer to this one lies in the next section.

What Are My Goals?

  • If you want to develop software and sell it
    You should be majoring in Computer Science or Mathematics. Why? For intensive software deployment, it is often necessary to understand how programming works with the computer and how to optimize slow code, not to mention an upper hand of theory and experience from your courses. However, if you have a theoretical idea and want to outsource the programming, then you are an entrepreneur. It really is important to establish your goals within this segment due to the varied paths it leads to down the road...
  • If you want to use programming as a tool in research
    This is what QuantFS is prepping our members for. Scripting is the key here. If you have a large file of stock data and want to run statistics on it in real-time (as we did in one workshop), or you have a bunch of variances/expected returns and want to find the optimal risky portfolio or the global minimum variance portfolio using CAPM (as we did in another workshop), then you need to learn how to script. So, what does this entail? The ideal candidate is learning a technical major with good non-technical background. Scripting revolves around clever ways to minimize running-time and increase accuracy. Courses include: Discrete Math, Algorithms, Probability, Statistics, Regression, Time Series Analysis. The key in scripting is, "you can't be too clever here." Clever workarounds are awesome.

Which Language Do I Choose?

  • If you want to develop software and sell it
    Since programming power is the necessity here, recommended languages for deployment include (but are not limited to) C++/C#/C/Java for standard applications. If you want to network among computers, use .NET framework. If you want to deploy webside stuff think Perl, PHP, Python, Ruby (and MySQL or some variant for databases). If you want some handle on setting up servers to run tasks, think UNIX. Like I said before, this path has a lot of options...
  • If you want to use programming as a tool in research
    I would almost require (4) language TYPES. "Types"? WTF? Yes, I like to think of scripting in four distinct groups: Text, Presentation, Power, and Data. Let's go over a few commonly used languages and see their strengths...
    VBA (Visual Basic for Applications) - Specifically for Microsoft Excel, VBA is great as a presentation language. Why? Most clients want summaries of your research or performance and they love to see stuff in Excel. If you've worked a day in your life in Finance, you'll know that Finance = Excel. Is this the best thing to learn first? No, because it teaches bad programming habits; however, it is ridiculously easy to learn and use and the deployment time for written code is fast.
    Perl - This powerful language is the best text category language. Want to turn a CSV into a TDV (tab delimited)? Want to take a bunch of prices and stocks and sort them by date/ticker? Want to combine 30, 40-meg files and get the common items from all of them into a separate text file? Perl is your choice, hands down. The exact nature of this language (Larry Wall, the creator, is a linguist) makes it very understandable, comprehensive, and light. It also is a light install, is available for any OS and has superfast deployment time. Also works well with CMD, batching, and linked processes
    MATLAB - A common program/language for researchers, this tool is great for initial analysis and minor scripting, but it is not optimal for setting up systematic processes. It is also great as a presentation tool for selective quants who use it (Goldman uses it in case you were wondering). It's much more a theory person's "scripting" language. Deployment time is a little longer; however, it has some great commands for matrix algebra due to its nature of storing everything in matrix form (example, a = 0 is stored as a 1x1 array with the value at [1,1] = 1).
    S-PLUS, SAS, SPSS, S, R - (yes they are all different...kinda) Power. These are a necessary tool in developing your scripting power. These are 'statistical' languges...which means that they are great at doing statistics...What? There's a category for this shit? Yes there is! Deployment time is less than MATLAB and they are also more powerful because they are lighter on the system. "R" is great because it is open-source. S-PLUS has a nice GUI and help file. Make sure you learn one of these...once you have learned one, the others are just syntax differences.
    Bloomberg, Factset, MarketQA - These are tools that allow users to access data from a database (like below). Learning how to get data is just as important as playing with it. One day you will find yourself sitting at a job and your boss could say, "hey, run a regression on the DOW 30 and the S&P 500's monthly total returns." To which you would reply, "sure...where do you have the data?" *BAM* You are fired. It is quintessential!
    Compustat, Worldscope, I/B/E/S, etc... - These are financial databases. No way to "learn" them; you have to work with them in a job and understand (and deal with errors in) the data and this will double your power as a scripter.
    C/C++/C#/Java - These languages are interesting...They are very powerful (they have control over many system capabilities), they can do data mining, they have a large community following (libraries with prewritten code), BUT they have long deploy time unless you re-use code...a lot...

How Do I Go About Learning?

Ahh, the dreaded question...The *BEST* way to learn programming depends (again) on which path you take. To be honest, if you are going into software development, your coding can take time to develop but the theory you learn should come first. However, if you (like me) are a scripter, experience is key. Therefore, I like to suggest that the only way to learn is by doing. It is not an easy thing to say because, face it...where are you going to find interesting things to program? This has always halted the learning process with programming. The typical dialogue is:

Kid: Hey, What language should I learn?
Me : Well, if you are going into Finance and you don't know VBA yet, learn that...it's expected at this point.
Kid: Cool, is there a good book you would recommend?
Me : Sure, get Microsoft Excel VBA Programming for the Absolute Beginner by Birnbaum. He teaches you by making you program games. It's a great way to learn.
Kid: Ok.

This is where the process usually stops because there really is a limited connection between learning how to move a red cell around the spreadsheet and automating a series of regressions. *It does teach you syntax and makes you familiar with the GUI*, so it is necessary. What are some better projects to try out after that book?
  1. Get the current S&P 500 Constituent list (Google that phrase and you should hit S&P's website, there is a link to download the constituents to an excel file...check out the syntax of that link)
  2. Get the closing price (from Yahoo!Finance) of each of those companies. Again, check one company and look at the syntax of the link. There's a fast way to do this also (aka, you don't have to hit their site 500 times...)
  3. Make a frequency table of the log of the prices (=FREQ()) and divide them into buckets of 10.
  4. Make a graph of the buckets and the count to see if prices in the S&P 500 are lognormally distributed...

If anyone actually does this...let me know and show me your code, i'd be happy to check it out since I already have this done in some excel file.

Final Thoughts

I hope I haven't deterred anyone thinking about "trying it out" this summer, but programming is a committment that will pay off in the long-run. Just getting your feet wet is a rewarding experience as it teaches you to think in loops and conditions. Once you understand how long it takes to get good at programming, or how long it takes to complete a tasks, you will have a greater understanding of feasibility within your project workspace.

2 comments:

Anonymous said...

just use FactSet :)

Mitch said...

factset is pretty kickass compared to bloomberg for financial and screening. it also uses the northwood*? optimizer which works some wonders (I prefer Barra myself). It's definitely a great alternative to using MarketQA and Bloomberg...