Cool things I have worked on: Customer Relationship Management (by alaric)
Continuing my previous series of blog posts about interesting things I've worked on in my career (an analytical database engine and a transactional database engine), I thought it'd be interesting to talk about something I worked on that's less groundbreaking technology, but interesting in other ways.
My second ever full-time job was at a small company making a customer relationship management (CRM) system. The goal of a CRM is to track interactions with customers, thereby building a profile of them, and then to be able to isolate groups of interest and send them messages to try and entice them to buy more.k
This being the early 2000s, the system architecture was a central Java server in front of a PostgreSQL database, with a downloaded Java Swing desktop app used as the client to connect to it. The system was multi-tenanted; each of our customers had multiple user accounts attached to their customer record, all able to access the customer's shared state. There was also a Web server that provided HTTP services (more on those later) and called the Java server for the "business logic", and an email processing service that handled incoming and outgoing email messages, again also talking to the Java server.
When I started working there, the core server was all based around Enterprise Java Beans, running on JBoss. However, this involved a lot of tedious boilerplate bean code, and I found JBoss quite inscrutable; when the system was slow, or just locked up as it was prone to do, we restarted it to see if it got better. JBoss had a JMX interface to ask about its internals, but it was all just a soup of pools and factories and seemingly unnecessary complexity; I never quite figured out how to tell what operations were running, what locks were held, and that sort of useful thing to diagnose a problem.
So, one of the first interesting things we did was to tear JBoss out entirely. All the entity beans (classes that corresponded to database tables) were converted to plain old Java objects that had methods in to create/read/update/delete them in the database via direct SQL, cutting out the ORM. I wrote my own JDBC connection pool, and wrapped the session bean (the class that was actually instantiated for each client login) with a standard RMI server.
The effect of this was to make the core server suddenly take up much less memory, and it also stopped hanging. JBoss had way too much state hanging around; the new design only had the pool of currently active session objects and the JDBC connection pool, which meant there was much less to go wrong in some way.
But it was still often slow when the system was busy, and I wanted to find out why. I'd liked the idea of JMX, even if the implementation of it in JBoss was inscrutable, so I added a new interface to the Java server, available only to systems administrators, that would return a snapshot of the system's state. This involved:
- Some basic JVM-level stats about available memory and that sort of thing
- The number of allocated connections in the JDBC pool
- For each JDBC connection currently running a JDBC operation (converting a prepared statement into a result set or fetching rows from the result set, those being the two operations that caused real work to happen on the PostgreSQL server), the SQL text of the prepared statement and the values of the variables interpolated into it, and the accumulated run time of that prepared statement.
- For each prepared statement query text (bearing in mind there was a finite number of them encoded in the source, as all dynamic values were interpolated into the fixed query strings when the statement was prepared), a count of how many times it executed, its total run time, the minimum run time, the maximum run time, and the variables interpolated into it on the fastest and slowest runs.
- A list of all the threads in the JVM, with their thread titles, and the hierarchy of thread groups comprising them.
- For each exported RMI method, a count of how many times it had been executed, the total accumulated run time, and the shortest and longest run times. (I created a
Stopwatch
class used by this and the JDBC logging, which handled the collection of these kinds of statistics in a uniform way).
This was all trivially available information from the JVM, apart from the JDBC stuff, which I made my custom JDBC pool track for me as it proxied the requests through to the PostgreSQL JDBC driver.
I also made the RMI server wrap the session class with a proxy that, when an RMI request came in, would set the thread's title to the name of the connected user account and the name of the method; and then reset it to <idle>
when the RMI request terminated. This meant that the list of threads would contain a list of all currently running RMI requests. For a few methods that did more complicated things internally, I wrapped the steps in code to extend the title with more detail as to the current step, then restore the original title when it completed, so the thread list also reflected what sub-steps of each top-level RMI method were running where applicable.
So on top of the system now having the minimal number of "moving parts", there was now an interface to see what the current position of all the moving parts was. In a rare triumph of "not invented here syndrome", I'd replaced JBoss with something that did the same job... but in a way I liked 🙂
And that gave me the tools to work out what the performance problems were.
Pages: 1 2