Big Data & Analytics .... Flogged: February 2010

Thursday, 25 February 2010

Bind variables - The key to application performance but disallows the potential for star transformations in a data warehouse.

If you've been developing applications on Oracle for a while, you've no doubt come across the concept of «Bind Variables». Bind variables are one of those Oracle concepts that experts frequently cite as being key to application performance, but it's often not all that easy to pin down exactly what they are and how you need to alter your programming style to use them.

To understand bind variables, consider an application that generates thousands of SELECT statements against a table; for example:

SELECT fname, lname, pcode FROM cust WHERE id = 674;
SELECT fname, lname, pcode FROM cust WHERE id = 234;
SELECT fname, lname, pcode FROM cust WHERE id = 332;

Each time the query is submitted, Oracle first checks in the shared pool to see whether this statement has been submitted before. If it has, the execution plan that this statement previously used is retrieved, and the SQL is executed. If the statement cannot be found in the shared pool, Oracle has to go through the process of parsing the statement, working out the various execution paths and coming up with an optimal access plan before it can be executed. This process is know as a «hard parse» and for OLTP applications can actually take longer to carry out that the DML instruction itself.

When looking for a matching statement in the shared pool, only statements that exactly match the text of the statements are considered; so, if every SQL statement you submit is unique (in that the predicate changes each time, from id = 674 to id=234 and so on) then you'll never get a match, and every statement you submit will need to be hard parsed. Hard parsing is very CPU intensive, and involves obtaining latches on key shared memory areas, which whilst it might not affect a single program running against a small set of data, can bring a multi-user system to it's knees if hundreds of copies of the program are trying to hard parse statements at the same time. The extra bonus with this problem is that contention caused by hard parsing is pretty much immune to measures such as increasing available memory, numbers of processors and so on, as hard parsing statements is one thing Oracle can't do concurrently with many other operations, and it's a problem that often only comes to light when trying to scale up a development system from a single user working on subset of records to many hundreds of users working on a full data set.

The way to get Oracle to reuse the execution plans for these statements is to use bind variables. Bind variables are «substituion» variables that are used in place of literals (such as 674, 234, 332) and that have the effect of sending exactly the same SQL to Oracle every time the query is executed. For example, in our application, we would just submit

SELECT fname, lname, pcode FROM cust WHERE id = :cust_no;

and this time we would be able to reuse the execution plan every time, reducing the latch activity in the SGA, and therefore the total CPU activity, which has the effect of allowing our application to scale up to many users on a large dataset.

Bind Variables in SQL*Plus

In SQL*Plus you can use bind variables as follows:

SQL> variable deptno number
SQL> exec :deptno := 10
SQL> select * from emp where deptno = :deptno;

What we've done to the SELECT statement now is take the literal value out of it, and replace it with a placeholder (our bind variable), with SQL*Plus passing the value of the bind variable to Oracle when the statement is processed. This bit is fairly straighforward (you declare a bind variable in SQL*Plus, then reference the bind variable in the SELECT statement)

Bind Variables in PL/SQL

Taking PL/SQL first of all, the good news is that PL/SQL itself takes care of most of the issues to do with bind variables, to the point where most code that you write already uses bind variables without you knowing. Take, for example, the following bit of PL/SQL:

create or replace procedure dsal(p_empno in number)
as
begin
update emp
set sal=sal*2
where empno = p_empno;
commit;
end;
/

Now you might be thinking that you've got to replace the p_empno with a bind variable. However, the good news is that every reference to a PL/SQL variable is in fact a bind variable.

Dynamic SQL

In fact, the only time you need to consciously decide to use bind variables when working with PL/SQL is when using Dynamic SQL.

Dynamic SQL, allows you to execute a string containing SQL using the EXECUTE IMMEDIATE command. For next example would always require a hard parse when it is submitted:

create or replace procedure dsal(p_empno in number)
as
begin
execute immediate
'update emp set sal = sal*2 where empno = '||p_empno;
commit;
end;
/

The way to use bind variables instead is to change the EXECUTE IMMEDIATE command as follows:

create or replace procedure dsal(p_empno in number)
as
begin
execute immediate
'update emp set
sal = sal*2 where empno = :x' using p_empno;
commit;
end;
/

And that's all there is to it. One thing to bear in mind, though, is that you can't substitute actual object names (tables, views, columns etc) with bind variables - you can only subsitute literals. If the object name is generated at runtime, you'll still need to string concatenate these parts, and the SQL will only match with those already in the shared pool when the same object name comes up. However, whenever you're using dynamic SQL to build up the predicate part of a statement, use bind variables instead and you'll reduce dramatically the amount of latch contention going on.

The Performance Killer

Just to give you a tiny idea of how huge of a difference this can make performance wise, you only need to run a very small test:

Here is the Performance Killer ....

SQL> alter system flush shared_pool;
SQL> set serveroutput on;

declare
type rc is ref cursor;
l_rc rc;
l_dummy all_objects.object_name%type;
l_start number default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name
from all_objects
where object_id = ' || i;
fetch l_rc into l_dummy;
close l_rc;
-- dbms_output.put_line(l_dummy);
end loop;
dbms_output.put_line
(round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
101.71 Seconds...

... and here is the Performance Winner:

declare
type rc is ref cursor;
l_rc rc;
l_dummy all_objects.object_name%type;
l_start number default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name
from all_objects
where object_id = :x'
using i;
fetch l_rc into l_dummy;
close l_rc;
-- dbms_output.put_line(l_dummy);
end loop;
dbms_output.put_line
(round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
1.9 Seconds...

That is pretty dramatic. The fact is that not only does this execute much faster (we spent more time PARSING our queries then actually EXECUTING them!) it will let more users use your system simultaneously.

Bind Variables in VB, Java and other applications

The next question is though, what about VB, Java and other applications that fire SQL queries against an Oracle database. How do these use bind variables? Do you have to in fact split your SQL into two statements, one to set the bind variable, and one for the statement itself?

In fact, the answer to this is actually quite simple. When you put together an SQL statement using Java, or VB, or whatever, you usually use an API for accessing the database; ADO in the case of VB, JDBC in the case of Java. All of these APIs have built-in support for bind variables, and it's just a case of using this support rather than just concatenating a string yourself and submitting it to the database.

For example, Java has PreparedStatement, which allows the use of bind variables, and Statement, which uses the string concatenation approach. If you use the method that supports bind variables, the API itself passes the bind variable value to Oracle at runtime, and you just submit your SQL statement as normal. There's no need to separately pass the bind variable value to Oracle, and actually no additional work on your part. Support for bind variables isn't just limited to Oracle - it's common to other RDBMS platforms such as Microsoft SQL Server, so there's no excuse for not using them just because they might be an Oracle-only feature.

Lastly, it's worth bearing in mind that there are some instances where bind variables are probably not appropriate, usually where instead of your query being executed many times a second (as with OLTP systems) your query in fact actually takes several seconds, or minutes, or hours to execute - a situation you get in decision support and data warehousing. In this instance, the time taken to hard parse your query is only a small proportion of the total query execution time, and the benefit of avoiding a hard parse is probably outweighed by the reduction in important information you're making available to the query optimizer - by substituting the actual predicate with a bind variable, you're removing the ability for the optimiser to compare your value with the data distribution in the column, which might make it opt for a full table scan or an index when this isn't appropriate. Oracle 9i helps deal with this using a feature known as bind variable peeking, which allows Oracle to look at the value behind a bind variable to help choose the best execution plan.

Another potential drawback with bind variables and data warehousing queries is that the use of bind variables disallows the potential for star transformations, taking away this powerful option for efficiently joining fact and dimension tables in a star schema.

Wednesday, 10 February 2010

How should you approach a new data profiling engagement?

How should you approach a new data profiling engagement?

Data profiling is best scheduled prior to system design, typically occurring during the discovery or analysis phase. The first step -- and also a critical dependency -- is to clearly identify the appropriate person to provide the source data and also serve as the “go to” resource for follow-up questions. Once you receive source data extracts, you’re ready to prepare the data for profiling. As a tip, loading data extracts into a database structure will allow you to freely write SQL to query the data while also having the flexibility to use a profiling tool if needed.
When creating or updating a data profile, start with basic column-level analysis such as:
Distinct count and percent: Analyzing the number of distinct values within each column will help identify possible unique keys within the source data (which I’ll refer to as natural keys). Identification of natural keys is a fundamental requirement for database and ETL architecture, especially when processing inserts and updates. In some cases, this information is obvious based on the source column name or through discussion with source data owners. However, when you do not have this luxury, distinct percent analysis is a simple yet critical tool to identify natural keys.
Zero, blank, and NULL percent: Analyzing each column for missing or unknown data helps you identify potential data issues. This information will help database and ETL architects set up appropriate default values or allow NULLs on the target database columns where an unknown or untouched (i.e.,., NULL) data element is an acceptable business case. This analysis may also spawn exception or maintenance reports for data stewards to address as part of day-to-day system maintenance.
Minimum, maximum, and average string length: Analyzing string lengths of the source data is a valuable step in selecting the most appropriate data types and sizes in the target database. This is especially true in large and highly accessed tables where performance is a top consideration. Reducing the column widths to be just large enough to meet current and future requirements will improve query performance by minimizing table scan time. If the respective field is part of an index, keeping the data types in check will also minimize index size, overhead, and scan times.
Numerical and date range analysis: Gathering information on minimum and maximum numerical and date values is helpful for database architects to identify appropriate data types to balance storage and performance requirements. If your profile shows a numerical field does not require decimal precision, consider using an integer data type because of its relatively small size. Another issue which can easily be identified is converting Oracle dates to SQL Server. Until SQL Server 2008, the earliest possible datetime date was 1/1/1753 which often caused issues in conversions with Oracle systems.
With the basic data profile under your belt, you can conduct more advanced analysis such as:
Key integrity: After your natural keys have been identified, check the overall integrity by applying the zero, blank, and NULL percent analysis to the data set. In addition, checking the related data sets for any orphan keys is extremely important to reduce downstream issues. For example, all customer keys from related transactions (e.g., orders) should exist in the customer base data set; otherwise you risk understating aggregations grouped by customer-level attributes.
Cardinality: Identification of the cardinality (e.g. one-to-one, one-to-many, many-to-many, etc.) between the related data sets is important for database modeling and business intelligence (BI) tool set-up. BI tools especially need this information to issue the proper inner- or outer-join clause to the database. Cardinality considerations are especially apparent for fact and dimension relationships.
Pattern, frequency distributions, and domain analysis: Examination of patterns is useful to check if data fields are formatted correctly. As example, you might validate e-mail address syntax to ensure it conforms to user@domain. This type of analysis can be applied to most columns but is especially practical for fields that are used for outbound communication channels (e.g., phone numbers and address elements). Frequency distributions are typically simple validations such as “customers by state” or “total of sales by product” and help to authenticate the source data before designing the database. Domain analysis is validation of the distribution of values for a given data element. Basic examples of this include validating customer attributes such as gender or birth date, or address attributes such as valid states or provinces within a specified region. Although these steps may not play as critical a role in designing the system, they are very useful for uncovering new and old business rules.
Picking the right techniques depends on the project objectives. If you’re building a new database from scratch, take the time to execute and review outcomes of each of the above bullet points. If you’re simply integrating a new data set into an existing database, select the most applicable tasks that apply to your source data.
All of these steps may be conducted by writing raw SQL. The basic profiling steps can usually be accomplished using a tool designed specifically for data profiling. Many third-party data profiling tools have been introduced into the marketplace over the last several years to help streamline the process. These tools typically allow the user to point to a data source and select the appropriate profile technique(s) to apply. The outputs of these tools vary, but usually a data source summary is produced with the field level profile statistics.

Furthermore, understanding the available data, the missing data, and the required data can help map out future technical strategies and data capture methods. Using this information to improve data capture techniques will improve the source data integrity and may lead to further improvements in overall customer engagement and intelligence.
The data profiling process may seem arduous and less than glamorous at times, but it is an important step which adds value to any database project.

Wednesday, 3 February 2010

What is Security Clearance in the UK?

What is Security Clearance?
National Security Vetting is carried out so that people can work or carry out tasks, which in the course of doing so, requires national security clearance. Government organisations including the Ministry of Defence, Central Government, Defence Estates and the Armed Forces require Security Cleared personnel as well as Companies in the private sector contracted to undertake work for these bodies. Security Clearance levels vary depending upon the sensitivity of the information that is accessed.
The main Security Clearing bodies are:
The Defence Vetting Agency (DVA)
Foreign and Commonwealth Office (FCO)
Metropolitan Police Service (MPS)
How to obtain Security Clearance
You cannot apply for Security Clearance as an individual. Clearance is requested by an employer and carried out by Government agencies. Security Clearance is granted for a specific period of time depending on the employment term or for a particular project.
Security Clearance can be verified and transferred to a new employer if required. If you do not have the Security Clearance required for a particular role you would not be able to start your employment until clearance has been obtained.
You do not have to be a British National in order to gain Security Clearance, but you will have to meet the following criteria depending on the level of clearance required.
There are four main types of national security vetting and clearances:
Developed Vetting (DV) This is the highest level of Security Clearance and is required for people with substantial unsupervised access to TOP SECRET assets, or for working in the intelligence or security agencies. This level of clearance involves Security Check (SC) and, in addition, completion of a (DV) questionnaire, financial checks, checking of references and a detailed interview with a vetting officer. To gain (DV) clearance you will normally have had to have been a resident in the UK for a minimum of 10 years.
Security Check (SC) is for people who have substantial access to SECRET, or occasional access to TOP SECRET assets and information. This level of clearance involves a (BPSS) check plus UK criminal and security checks and a credit check. To gain (SC) clearance you will normally have had to have been a resident in the UK for a minimum of 5 years.
Counter Terrorist Check (CTC) is required for personnel whose work involves close proximity to public figures, gives access to information or material vulnerable to terrorist attack or involves unrestricted access to certain government or commercial establishments. A (CTC) does not allow access, or knowledge, or custody, of protectively marked assets and information. The check includes a Baseline Personnel Security Standard Check (BPSS) and also a check against national security records. To gain (CTC) clearance you will normally have had to have been a resident in the UK for a minimum of 3 years.

Baseline Personnel Security Standard (BPSS) (formally Basic Check)and Enhanced Baseline Standard (EBS) (formerly Enhanced Basic Check or Basic Check +): These are not formal security clearances; they are a package of pre-employment checks that represent good recruitment and employment practice.
A BPSS or EBS aims to provide an appropriate level of assurance as to the trustworthiness, integrity, and probable reliability of prospective employees. The check is carried out by screening identity documents and references.
Other Security checks and clearances:
NATO has four levels of security classification; NATO RESTRICTED (NR), NATO CONFIDENTIAL (NC), NATO SECRET (NS) and COSMIC TOP SECRET (CTS) UMBRA.
NATO's clearance levels function independent of any clearance levels for other nations. However, it is understood that for most NATO nations, granting of a NATO security clearance is handled in a similar manner to that of obtaining a national security clearance.
MPS Vetted
Metropolitan Police Vetting is carried out for all members of the Metropolitan Police Service (police officers, police staff and members of the specials constabulary) Non Police Personnel including Contactors, Contractors representatives, consultants, volunteers and any person who requires unescorted access to MPS premises or uncontrolled access to police information.
The MPS has the following Force Vetting levels:
• Initial Vetting Clearance (IVC)
• Management Vetting (MV)
SIA
The Security Industry Authority operates the compulsory licensing of individuals working in specific sectors of the private security industry within the UK.
The activities licensed under the Private Security Industry 2001 are:
Manned guarding, which includes:
Cash and Valuables in Transit
Close Protection
Door Supervision
Public Space Surveillance (CCTV)
Security guard
Immobilisation, restriction and removal of vehicles
Key Holding
Criminal Records Bureau (CRB) clearance is required for posts that involve working with children or vulnerable adults. Standard Disclosures may also be issued for people entering certain professions, such as members of the legal and accountancy professions. Standard Disclosures contain the following; details of all convictions, cautions, reprimands and warnings held on the Police National Computer (PNC);
Enhanced CRB checks are required for posts involving a far greater degree of contact with children or vulnerable adults involving regular caring for, supervising, training or being in sole charge of such people i.e. Teacher, Scout or Guide leader. Enhanced Disclosures contain the same information as the Standard Disclosures but with the addition of local police force information considered relevant by Chief Police Officer(s).

There are three official criminal record disclosure services within the UK:
CRB provides a service for England & Wales.
Disclosure Scotland is a service provided to manage and operate the Disclosure service in Scotland. Disclosures give details of an individual’s criminal convictions (and in the case of Enhanced Disclosures, where appropriate, non-conviction information).
AccessNI provides a service for Northern Ireland with Disclosures at Basic, Standard and Enhanced levels

Q & A

What is Security Clearance?
Personnel Security vetting is carried out so that people may take certain jobs or carry out tasks that need a national security clearance. These jobs and tasks are located throughout the Ministry of Defence and Armed Forces, as well as in the private sector dealing with defence related work. In addition, a number of other government departments and organisations require Security Clearance.
How do I get a Security Clearance?
First you need a sponsor. Individuals and companies cannot ask for a security clearance unless they are sponsored, and you will not be sponsored unless they are contracted (or are in the process of being contracted) to work on one or more specific MOD classified projects.
For large contracts, an officer in the Defence Procurement Agency (DPA) or Defence Logistics Organisation (DLO) - typically a Project Officer will be your sponsor. For staff in sub-contracted organisations, sponsorship will be provided through the prime contractor.
Why does MOD insist on having sponsors for security clearances? Why can't I just apply for a security clearance?
A security clearance provides a certain level of assurance at a point in time, as to an individual's suitability to have trusted access to sensitive information.
It does not provide a guarantee of future reliability, and all security clearances are kept under review to ensure that the necessary level of assurance is maintained. This review is carried out by Government Departments and Government-sponsored contractors, who are responsible for the oversight and aftercare of individuals, granted a security clearance.
The main types of checks and clearances are listed below and are processed by the following Governments agencies:
Defence Vetting Agency (DVA)
Foreign and Commonwealth Office (FCO)
Metropolitan Police Service (MPS)
National Security Vetting
Developed Vetting (DV) or (DV Cleared) is required for people with substantial unsupervised access to TOP SECRET assets. The following security vetting stages are mandatory before a DV clearance can be approved:
BaselinePersonnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Credit Reference Check and review of personal finances
Security Service Check
Check of medical and psychological information provided
Subject Interview and further enquiries, which will include interviews with character referees and current and previous supervisors
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a DV clearance.
Once a clearance is granted, it is only valid for a pre-determined period after which a review must be conducted if the clearance is still required. The time interval before a review is required is specified in guidance issued by the cabinet office.
A small number of clearances are granted in spite of some reservations. Risk management requires follow-up work and monitoring of some cases. This activity is termed "aftercare", and may be required in connection with any of the above clearances.
Security Check (SC) or (SC Cleared) is required for people who have substantial access to SECRET or occasional controlled access to TOP SECRET assets. The following security vetting stages comprise a full SC clearance:
Baseline Personnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Credit Reference Check
Security Service Check
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a SC clearance.
Counter Terrorist Check (CTC) or (CTC Cleared) is required for people who work in close proximity to public figures, or who have access to information or material vulnerable to terrorist attack, or involves unrestricted access to government or commercial establishments assessed to be at risk from terrorist attack. A CTC does not allow access to, or knowledge or custody of, protectively marked assets, but the baseline Personnel Security Standard which is carried out on all MOD personnel and contractors, allows a degree of access. The following security vetting stages are mandatory before a CTC clearance can be approved:
Baseline Personnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Security Service Check
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a CTC clearance.
What are Employment Checks?
Baseline Personnel Security Standard (BPSS) (formally Basic Check) and Enhanced Baseline Standard (EBS) (formerly Enhanced Basic Check or Basic Check +): These are not formal security clearances; they are a package of pre-employment checks that represent good recruitment and employment practice. A BPSS or EBS aims to provide an appropriate level of assurance as to the trustworthiness, integrity, and probable reliability of prospective employees and should be applied to:
All successful applicants for employment in the public sector and Armed Forces (both permanent and temporary)
All private sector employees working on government contracts (e.g. contractors and consultants), who require access to, or knowledge of, government assets protectively marked up to and including CONFIDENTIAL.
BPSS and EBS are normally conducted by the recruitment authorities or companies to the agreed standard, and because they underpin the national security vetting process it is vital that they are carried out properly and thoroughly and before any further vetting is completed.
Employment Assurance (disclosures) (EA (D)) are required by people from MOD sponsored units and organisations that benefit the MOD, who are being considered for employment with children or vulnerable adults. DVA acts as a co-ordinator for these requests.

Why is the National Security Vetting (NSV)
System necessary and what does it aim to achieve?
The UK needs a security system to protect against threats from hostile intelligence services, terrorists and other pressure groups. Vetting ensures that anyone who goes through the process can be trusted with sensitive government information or property.
Who is affected?
The system applies to people in the following categories whose employment involves access to sensitive Government assets, Crown servants, members of the security and intelligence agencies; members of the armed forces; the police; employees of certain other non-government organisations that are obliged to comply with the Government’s security procedures; employees of contractors providing goods and services to the Government.
How does the vetting system work?
Candidates for jobs that provide access to sensitive information or sites are asked to complete one or more security questionnaires, which invite them to provide the personal details needed to enable the necessary checks to be carried out. Interviews may also be carried out. The depth of checks varies according to the level of regular access to sensitive information that the job entails.
How confidential is the vetting process?
All personal information gathered during the vetting process is handled in the strictest of confidence by the vetting agencies. These bodies include The Defence Vetting Agency (DVA), The Foreign and Commonwealth Office (FCO) and the Metropolitan Police Service (MPS). In a very small number of cases, where serious risks have been identified, a case may be discussed with the Ministry of Defence, security and personnel authorities. In an even smaller number of cases, and only where the person being vetted agrees, line management may be given some relevant information and be asked to help manage the risk. There is an extremely remote possibility of disclosure of vetting information in connection with criminal or civil proceedings.
How do I get a security clearance?
Individuals and companies cannot ask for a Security Clearance unless they are sponsored, and you will not be sponsored unless they are contracted (or are in the process of being contracted) to work on one or more specific MOD / Government classified projects. For large contracts, an officer in the Defence Procurement Agency (DPA) or Defence Logistics Organisation (DLO) - typically a Project Officer will be your sponsor. For staff in sub-contracted organisations, sponsorship will be provided through the prime contractor.
Why can't I just apply for a security clearance?
Security Clearance provides a certain level of assurance, at a point in time, as to an individual’s suitability to have trusted access to sensitive information. It does not provide a guarantee of future reliability, and all security clearances are kept under review to ensure that the necessary level of assurance is maintained. This review is carried out by Government Departments and Government-sponsored contractors, who are responsible for the oversight and aftercare of individuals, granted a security clearance. This would not be possible in the case of private individuals. Security Vetting / Security Clearance is carried out to the following levels by approved government bodies.
Security Vetting / Security Clearance is carried out to the following levels by approved government bodies.
Levels of UK Clearance:
DV / Developed Vetting (MOD)
SC / Security Check
CTC / Counter Terrorist Check
EBS / Enhanced Baseline Standard
BPSS / Basic Personnel Security Standard
NATO / NATO Cleared
MPS / Metropolitan Police Service
SIA / Security Industry Authority
ECRB / Enhanced Criminal Records Bureau
CRB / Criminal Records Bureau

*** I am SC Cleared and working on such a program **

Tuesday, 2 February 2010

Avatar - The computing and data centre behind making of

The computing and data centre behind making of Avatar

A palm-swept suburb of Wellington, New Zealand is not the first place you'd look for one of the most powerful purpose-built data centers in the world. Yet Miramar, pop. 8,334, is home to precisely that, along with a huge campus of studios, production facilities and soundstages.
The compound is a project that began 15 years ago, inspired by filmmakers Peter Jackson, Richard Taylor and Jamie Selkirk. The studios have since been the main location for creating The Lord of the Rings movies, King Kong, and several others.

Right in the middle sits Weta Digital, the increasingly famous visual effects production house behind high-end commercials and blockbuster movies, most lately the $230 million James Cameron extravaganza AVATAR.
Despite the locale, Weta has drawn plenty of attention. Five Academy Award nominations and four Oscars will do that, but publicist Judy Alley says nothing has matched the buzz of AVATAR. “We’ve done more than 100 interviews in the last few months,” Alley says. With most of the attention focused on the movie’s immersive look, Alley was glad someone was interested to look at the technology installation that sits within Weta and kindly connected us to two of the people who make it run.
As they explained, what makes Weta and a project like AVATAR work is in equal parts the computing power of the data center that creates the visual effects, and the data management of artistic processes that drive the computing.
Hot Gear
Weta Digital is really a visual effects job shop that manages thousands of work orders of intense amounts of data. That preselects most of the fast, constant capacity equipment required. The data center used to process the effects for AVATAR is Weta’s 10,000 square foot facility, rebuilt and stocked with HP BL2x220c blades in the summer of 2008.
The computing core - 34 racks, each with four chassis of 32 machines each - adds up to some 40,000 processors and 104 terabytes of RAM. The blades read and write against 3 petabytes of fast fiber channel disk network area storage from BluArc and NetApp.
All the gear sits tightly packed and connected by multiple 10-gigabit network links. “We need to stack the gear closely to get the bandwidth we need for our visual effects, and, because the data flows are so great, the storage has to be local,” says Paul Gunn, Weta’s data center systems administrator.
That ruled out colocation or cloud infrastructure, leaving Gunn as a sort of owner-operator responsible for keeping the gear running. It also required some extra engineering for the hardware because the industry standard of raised floors and forced-air cooling could not keep up with the constant heat coming off the machines churning out a project like AVATAR.
Heat exchange for an installation like Weta’s has to be enclosed, water cooled racks where the hot air is sucked into a radiator and cycled back through the front of the machines. “Plus,” Gunn says, “we run the machines a bit warm, which modern gear doesn’t mind, and the room itself is fairly cool.”

With building costs absorbed, water cooling becomes much less expensive than air conditioning, and the engineering in the data center allows fine tuning. “I don’t want to give you an exact figure,” says Gunn, “but we’re talking tens of thousands of dollars saved by changing the temperature by a degree.”
Because of passive heat exchangers and the local climate, Weta pays no more than the cost of running water pumps to get rid of heat for all but a couple months a year. Just weeks ago, Weta won an energy excellence award for building a smaller footprint that came with 40 percent lower cooling costs for a data center of its type.
Throughput Revisited
The other half of Weta Digital’s processing story comes from the intense visual effect activities that heat up the data center.
Weta Digital is actually two companies, Weta Workshop, where a crew of artists and craftsmen create physical models, and the like-named Weta Digital, which creates digital effects for commercials, short films and blockbuster movies.
"If it's something that you can hold in your hand, it comes from Weta Workshop," says Gunn, "whereas if it's something that doesn't exist, we'll make it."
In the visual effects process, a mix of inputs come from storyboards, director revisions and tweaking by internal and external digital artists who turn a director’s concept into an image via 3D software from Maya or Pixar’s RenderMan. Artists work through concepts, and iterate versions to get movement and lighting just right. It’s nothing the movie industry hasn’t done all along, says Gunn, only now the tools are different and more data intensive.
The main activity in a visual effects data center is called rendering, the process of turning the digital description of an image into an actual image that can be saved to disk and eventually written to film or another media.
The banks of computers are called render walls, where Joe Wilkie serves as Weta’s “manager wrangler,” the person who oversees the data flow and feeds jobs through the pipeline.
“Wrangler” is a traditional but still common film industry term that first referred to the people who herded the horses and other livestock in Western movies. Likewise, Wilkie says he’s most often called a “render wrangler,” in this case someone who rounds up digital files rather than cattle. “Each part of a movie is an individual item, and it all has to be put together,” he says. “So when an artist is working on a shot, they will hit a button that launches a job on the render wall and loads it into our queueing system.”
The queueing system is a Pixar product called Alfred, which creates a hierarchical job structure or tree of multiple tasks that have to run in a certain order. In any single job, there might be thousands of interdependent tasks. As soon as CPUs on the render wall are freed up, new tasks are fired at idle processors.
At the peak of AVATAR, Wilkie was wrangling more than 10,000 jobs and an estimated 1.3 to 1.4 million tasks per day. Each frame of the 24 frame-per-second movie saw multiple iterations of back and forth between directors and artists and took multiple hours to render.

For Gunn’s data center, that added up to processing seven or eight gigabytes of data per second, a job that ran 24 hours a day for the last month or more of production. It’s a Goldilocks task of keeping the gear running fast, “not too fast, not too slow, but just right,” Gunn says, to keep the production on schedule. “It’s a complex system and when you’re on deadline with a project like this, you really want to make sure the lights stay on.”
A final film copy of AVATAR is more humble than all the back and forth that occurred in its creation: at 12 megabytes per frame, each second stamped onto celluloid amounts to 288 megabytes or 17.28 gigabytes per minute. Deduct the credits from the 166-minute movie and you understand better what the final file consists of.
But the immersive effect of AVATAR comes from the many hours or days of attention to each of about 240,000 frames that go into the final product. Weta asked us to mention vfx supervisor Joe Letteri, who oversaw the interactions of directors, a half-dozen lead concept artists and the supporting artists who made the technology process so intensive and powerful.

Big Data & Analytics .... Flogged