11.13.08
Azure clouds (but in what color skies?)
Microsoft Azure is being promoted to us as a cloud computing system, and we heard David Chou present on the benefits of cloud computing and when it is (and isn’t) a good solution.
Rent, not buy
We’re talking about resources being provided as services by companies with the full-scale technology to handle them as they grow. Suppose you own a small business. Instead of paying an exorbitant price to purchase a software license upfront before your company has made that kind of money, or building and maintaining your own data center, which may cost you a price out of scale to the amount you’ll gain from it (is it cost-efficient for a small business to keep the IT department maintaining a data center in a proper environment 24-7?), you can take advantage of a solution that will easily scale as your business grows, without as much overhead laid out at the start.
Cloud computing is about subscriptions to resources of various types: purchased applications rather than ones built in-house, using external hosting rather than running your own server, Software as a Service rather than running the applications yourself, and so on. In these cases, it’s more efficient for large-scale companies dedicated to these tasks to do them than it is for every business to do them itself, so those companies can sell them for a lower price than the cost for each business to meet its own computing needs in-house.
When is cloud computing useful?
Control vs. economy
The more you create your own solutions– in-house applications, maintaining your own data, hosting yourself– the more you have control over them. On the other hand, outsourcing provides you efficiency of speed and scale. You have to find the balance that works best for a given task. On-site solutions prioritize data consistency, integrity, and a focus on commits. Cloud computing, on the other hand, is best when prioritizing accessibility.
So an online retailer might want to use cloud computing to host their catalog, because the important thing is speed and accessibility: this is the BASE solution (Basically available, soft state, and eventually consistent), which considers that availability is of prime importance and a best effort is good enough; if the customer refreshes to discover that an item which was showing as available has become sold out, this is not a disaster. But when dealing with the financial end of things and processing transactions, it would be best to switch to on-side computing, the ACID solution (Atomicity, Consistency, Isolation, Durability), which stresses accuracy more than speed, because it’s more important that the transaction data be correct.
Ever gone to an online retailer and found that their catalog loads plenty fast, but their transaction takes longer to process? Of course we never really enjoy it when a website is slow, but it’s not much of a problem at that point; by the time we make a purchase, we’re not trying to quickly browse through to find more results. The important thing at that point is that the purchase goes through correctly and we are charged the right amount and receive the right product. Imagine that it worked the other way around, with the catalog in ACID and the financial transactions in BASE. You’d have a slow catalog that would take forever to load, possibly causing users to give up and go to another site. If they did stick around and make a purchase, their financial transaction would go through quickly, but it wouldn’t be in the company’s control or have as much data integrity; the more hands the transaction passes through, the more chances there are for something to go wrong.
CAP theorem
CAP stands for Consistency, Availability, and Partition; the CAP theorem says that we can have at most two of these three. Thus, we have to decide which two are most important to us. There are times when we will need to sacrifice economy of scale in order to ensure data integrity and consistency; however, there are times when what we need is the fastest, most accessible solution to our problem. This is where cloud computing excels.
Like most technologies, cloud computing is a tradeoff; the important thing is knowing which situations are right for it, and which are not.
09.16.08
Business Rules Engine – What, why, and how?
A business rules engine (BRE) can sound complicated, but it’s not difficult in principle. No one ever explained it to me, though, so I had to do my own research. This is the information that helped me; I hope it helps you.
What?
BREs are machines that are designed to evaluate data based on some rules created by the business experts. The rules describe what the business experts want to happen, and the engine uses the rules to crunch the data and calculate what should be done. “Rules”, in this case, really mean the decisions made by the people in suits upstairs, their ideas of when to sell what, the promotions they want to run and for how long, or how good someone’s credit score has to be for them to get a loan.
A typical BRE might consist of four components:
1) Rules Editor – This is where the rules are designed and tested.
2) Rules Repository – This is where the rules are stored so that applications can access them.
3) Rules Engine – This executes the rules.
4) Rules Administration Component – This modifies, updates, and allows remote administration of the rules, usually via the web.
Why?
Up To Speed
A BRE takes the business decisions out of the hard-coded software so that the software itself doesn’t have to be changed every time someone’s policy does. Instead, the hard-coded software is just an engine that adapts itself to follow whatever rules you give it. You can feed the rules to the machine, confident that it will do its thing, and its inner workings can be abstracted away.
In some businesses, the policies and the conditions on which decisions get made are changing all the time. Tax calculations, market rates, promotional discounts, and credit ratings are all examples of things that change too quickly for programmers to keep changing their software every time the conditions do. A BRE helps companies keep up with the current conditions.
This way, every time the high-flying people in suits decide to change things around, they don’t have to call up all the programmers and tell them to do the software over. They just change the rules. The programmers can work on more interesting things (and be more productive– substitute one phrase or the other depending on whether you’re trying to convince the IT people or their boss).
Thinking Clearly
The BRE is artificial intelligence that can make smart decisions. So is it going to do all your work for you? Not really. Formulating the business policy into a set of rules that will do what you want them to is not a simple task. You’ll still have to feed the engine the right rules if you want it to do the right things with your data. And, of course, the BRE is only as smart as the rules you give it.
It’s useful as a way to make sure that the people who know business can make the business decisions, and the people who know programming can make the software; everyone is doing just what they are good at, and on either end there is less guesswork about what the other side plans to do.
The nice thing about the business rules paradigm is that you can clearly separate the business logic from the underlying mechanisms that make it go. The rules come ultimately from the policy-makers, while the engine that implements them comes from people who understand good software design. You can see them as distinct matters with their own goals, rather than as one big muddied task. This can help you think more clearly about the requirements of the task.
Another way BREs can help us think more clearly about a program is the separation between rules and data. This is a lot like separating methods from objects. The data is a thing, and the rules tell the program what to do about that thing. This is already the way that we think when we use object-oriented design, so it isn’t anything new and hard to get used to.
How?
Rules + Data = Decision
The rules tell the engine what to do with the data. Rules consist of IF – THEN or IF – THEN – ELSE statements: IF this customer’s credit rating is above x, THEN approve customer’s loan. Alternately, IF product foo is selling at least x units, THEN make more of product foo, ELSE make more of best-selling product instead.
You can have a data-driven system that executes queries when the present data allows it to do so, a query-driven system that actively goes out looking for data to match the rule, or a bi-directional system that does both.
Forward and Backward
Forward chaining is the method used by a data-driven system. The system finds a rule whose IF condition matches the data, and triggers the THEN clause of the rule. When the THEN clause is triggered, the state of the data changes, so the machine goes and looks again for another rule with an IF condition that is met. It keeps on doing this until none of the rules apply. Then, it can’t execute any more of the rules until the state of the data changes again.
The forward-chaining approach is sometimes called a production system because each rule is its own little procedure. Added up, they produce something bigger. It’s often useful to try a production system when there are a lot of different ways to do the task, and you need the system to do it some way or another. You can give the system rules on how to do the task in a reasonable way, and it will do it. Forward chaining is also good at dealing with dependencies; when the condition depended on is met, the engine will see that the rule is ready to fire, and execute it.
Backward chaining is used by a goal-driven system. Instead of looking at the data and asking where it can go from there, it looks at the goal and tries to figure out how to meet it. If there is no rule that can be executed to immediately produce the goal, it sets sub-goals and tries to meet those. The sub-goals are conditions that need to be met so that one of the rules that leads to the ultimate goal can be executed.
The backward-chaining approach is often good at picking the best possible option out of many possibilities. Instead of finding anything that applies and running with it to get some result, it looks at every rule and tries to figure out what combination will produce the result it wants the most. It also gathers data as needed, rather than requiring an initially populated dataset of all the necessary information. Backward chaining is also good at managing subgoals and progressing in the right direction.
I would love to be able to explain how bi-directional chaining works, but it’s hard to find information about it on the web. I wish I knew more about it, myself. All I know is that it exists, and is probably complicated.
Advice on Better Use
The reason we consider both forward and backward chaining is that the question of which is better depends on the individual case. Any problem that can be solved by a forward-chaining system can be solved by a backwards-chaining system, and vice versa– but sometimes it is far easier to state a problem in one system than the other.
It may also be more efficient computationally to use one or the other; a problem with many good solutions might make use of forward chaining to quickly find something that works, whereas a large amount of rules and data might suggest backward chaining so that you don’t have to load and examine as much information. A forward-chaining system will produce a reasonable solution when there are many ways the problem could be solved; a backward-chaining system will try to match the conditions as closely as possible to the goal.
So if you don’t know what you want for dinner, forward chaining will help you make something, but if you know exactly what you want, backward chaining will help you figure out how to make it.
Acknowledgements
The author would like to express gratitude to the following sites:
Building Expert Systems in Prolog
AspAlliance: Understanding Business Process Rules
JBossRules Wiki
Pathfinder Development – Forward Versus Backward Chaining
Jocelyn Paine’s lecture notes
Sun Developer Network – Getting Started with the Java Rule Engine API
And, of course, Wikipedia.