Big Data & Analytics .... Flogged

Monday, 12 November 2012

What's in a support proposal

Slim line look at what’s should be in a support proposal. Not always necessary in the below order, but in general. << Just meant to serve as guidance, not a template >> Overview Scope of Work Requirements Coverage Your value proposition Getting down to the bottom of it Server Builds LAN and WAN Configuration and Access Control Application Deployment Monitoring and Alerting Backup and Recovery IT Health Checks Delivery Approach UAT Basic OAT Post Delivery Project Team Support Post Delivery Hardware & O/S Maintenance and Support Governance and Technical Direction Exclusions Risks, Assumptions, Issues & Dependencies Outline Transition Phase Change Control Process Change Impact – Impact Overview, (Impact score (1 = low, 5 = high, tick one box in each row) Impacted Area 1 2 3 4 5 Business Criticality Business Operation Architecture Technical Performance Design & Build Complexity Operational Complexity People Process Documentation ) Impact on Service, Impact on Products, Parties impacted, Impact Level, Timescales Impact Cost Impact Your Case Studies Focus on Business Sector, The project/program/application in question Consider Locations – Onsite, Offshore Study the client’s past awarded contracts – pitfalls and successes << Just meant to serve as guidance, not a template >> ..........................................................................Goodluck!!!

Monday, 27 June 2011

To Surrogate Key or Not

To Surrogate Key or Not
Design Challenge
Information Management Magazine, August 2006
A great way to sharpen our analysis and modeling skills is to continuously address real-world scenarios. A modeling scenario along with suggested solutions appears each month in this Design Challenge column. The scenario is emailed to more than 1,000 designers up to the challenge. Many of the responses, including my own, are consolidated here. If you would like to become a Design Challenger and have the opportunity to submit modeling solutions

Picture (Metafile)
The Response
A surrogate key is a unique identifier for a table, a numeric counter introduced early in the physical design stage. No meaning is implied by a surrogate key value. For example, you can't look at a month identifier of 1 and assume that it represents January. A surrogate key allows for more efficient navigation across structures and facilitates integration across applications. However, before telling the manager to always use surrogates, we need to consider the drawbacks: it can take more time and effort to navigate structures as well as add more complexity to the development phase. Therefore, we need guidelines in determining where it makes sense to create a surrogate key.

Create a surrogate key if the entity has a natural key and any of the following conditions hold:

There are multiple sources;
The natural key is inefficient; or
The natural key is recycled.

Surrogate versus Virtual
A natural key is ideally how the business would identify an entity instance, and a surrogate key is a substitute for this natural key. For example, a natural key for an organization could be taxpayer identifier or DUNS (D&B) Number. If no natural key exists, we can create a virtual key. A virtual key is a unique numeric counter used when it is not possible to identify an entity instance. For example, a manufacturing company has the concept of an adjustment. An adjustment is when a pallet is moved between shelves in a warehouse. The business has no way of identifying this kind of transaction, so the virtual key adjustment identifier is created.

Create a Surrogate if Multiple Sources Exist
If the entity you are modeling contains data from multiple sources, there is usually value in creating a surrogate key. For example, assume you learn that both Henry Winkler the student and Hank Winkler the instructor are really the same person. You can add this person as a unique row to the person entity identified by a person identifier surrogate key. This person can then play the two roles of instructor and student.

Create a Surrogate if the Natural Key is Inefficient
If the natural key is inefficient, create a surrogate key. Inefficient means that the data elements which make up the natural key have performance or space issues when joining across tables. This surfaces most frequently when joining a reference to a transaction table, such as a dimension to a fact table. For example, the business might identify a promotion by a promotion code and promotion start date. Although the code and start date have little impact on a promotion table containing only a few thousand rows, there will be an impact on the multimillion row order and credit tables that reference promotion.

Create a Surrogate if the Natural Key Values are Recycled
While working with a manufacturing plant, I learned that one of their key business concepts is identified by a code whose values repeat every 18 months. We decided to use a surrogate key as a buffer layer and create an alternate key on this natural key code plus a date. As an aside, it is a very good practice to always define an alternate key on the natural key when using a surrogate.

Every set of guidelines has exceptions, and for surrogate keys a very real exception is the limitation of application development tools. There is technology in use that requires surrogates while other tools prohibit them.

Steve Hoberman is one of the world's most well-known data modeling gurus. He taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then.

Problem with History - a look at surrogate key usage

A surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

The Problem with History
Performing an initial load of a type 2 dimension from a history-tracking source
A source system that keeps its own history presents interesting problems when used to load a type 2 history-tracking dimension table. The problem I am referring to arose in an actual project where we had a source system (SAP) which was keeping its own history of changes to an entity in a comprehensive, audit-trail sort of way. It appeared to have been tracking every change to the entity in question. But, in the data warehouse, for the corresponding dimension we only wanted to track history for a subset of attributes, treating the rest as type 1 attributes. This article focuses on the initial load, not the continuing maintenance, of a dimension from a set of history-tracking source records.

Let’s say that, in the source system (an HR application, for instance), we have the records shown in Figure 1.

Picture (Metafile)

In this table, EMPKEY is the "person key" or "person ID, " which is a surrogate key in the source system itself. Social Security NUmber, which is often thought of as a key, is a simple attribute. The primary key of the table is EMPKEY + VALID_FROM. A real HR system would obviously have more attributes than this, but, for purposes of this article, all we need are a mix of type 1 and type 2 attributes.

The table tells a tale. Jo, a female, living in Michigan, gets hired on 12/3/1998. On 12/27/1998, the HR staffers discover they entered her SSN incorrectly and change it; the source system automatically creates a new record, with a new VALID_FROM date of 12/28/2008. Seven years go by until, in April of 2005, Jo, who’s always felt uncomfortable as a female, goes into the online HR portal and changes her first name to "Joe"; the source system dutifully tracks the change. On August 8, Joe finally gets her operation, and an HR staff member changes her > his sex from "F" to "M," and her > his name to "Joseph." On February 13, Joseph decides he really prefers to go by "Joe" again, until July 5, when he flips his name back to "Joseph" and transfers to the company’s Big Apple office. On Christmas Eve, weary of the brusque and chilly streets of Manhattan, Joseph returns to Michigan. Jim or James, the other person in this table, has a simpler history: he changes from "Jim" to "James" on 3/16/2004 and then moves to Indiana on 6/23/2007. He’s apparently let go on 8/31/2007. (In this HR system, if your latest record doesn’t have a NULL in "VALID_TO," it means you’re terminated as of that date.)

The business users have stated that they don’t care about tracking or reporting on historical changes to first names or SSNs, but they do care about sex and state. In data warehousing terms, FIRSTNAME and SSN are type 1 attributes, and SEX and STATE are type 2.

We are doing an initial load of our PERSON dimension. Again, we do not cover, in this article, how to deal with comparisons between a history-tracking, type 2-ish source and an already-populated dimension table.

Ultimately, our PERSON dimension table should look like Figure 2.

Picture (Metafile)

DIM_K is just a standard dimension table surrogate key; only those records for the two people in question are displayed. Note that where we have a currently active record with no "natural" or source end date (the employee is still employed, the person is still alive, etc.), we enter a fake "high date" to make querying and joins easier. (This sort of thing will, doubtless, cause a massive "Year 9999 Problem.") Our PERSON dimension table tracks history for those attributes we care about, but washes out the history for the type 1 attributes. To get to this blissful state, we need to sift out the type 1 changes from the type 2 and only make new dimension records where we have a type 2 change.

In what follows, we’ll stick to fairly simple set logic. We’ll use no cursors or other looping-type constructs. While an ETL tool like Informatica or DataStage would be handy, any of this could be done fairly easily in straight SQL. We’ll also keep all the steps simple, easy to understand and discrete. It’s possible to create enormous heaps of nested SQL to do everything in one statement, but it's best to keep everything understandable. We create and label "tables," but whether you choose to actually create real database tables, or the tables are just record sets inside a flow, the process remains essentially unchanged.

Our first step is to order and number our records. Depending on what system you’re using, there will be any number of ways to do this, but start by ordering your source records by EMPKEY, VALID_FROM and add a row numbering RANK column (see Figure 3).

Picture (Metafile)

Next, we’ll need the earliest and latest records for each natural key. The earliest ones will form a sort of "base" for future joins; the latest ones carry the values we’ll use for the type 1 attributes. (The business rule here is that the latest records hold the type 1 attributes to use; other rules would demand different coding.)

NK_EARLIEST
select

s.*
from
src_ranked s inner join (select key, min(rank) r from src_ranked) mins
on s.rank = mins.r
Picture (Metafile)
NK_LATEST
select
s.*
from
src_ranked s inner join (select key, max(rank) r from src_ranked) maxes
on s.rank = maxes.r

The existence of this NK_LATEST table will serve us well later on.
Next, let’s make a table of the type 2 columns only. Within these we’ll need to sift out the records for a given natural key where there’s no change.

TYPE2_COLS
select
rank
,key
,sex
,state
,valid_from
,valid_to
from
src_ranked
order by
rank
Picture (Metafile)
In the table in Figure 6, the columns values that will form the VALID_FROM’s in our dimension are highlighted in red, and those that will supply the VALID_TO are blue.

For our next trick, inner join this table to itself, each record to the record one above it, in rank, where the key (natural key) is the same and where there’s something different about the type 2 attributes:

TYPE2_CHANGES
select
t2c1.rank rank1
,t2c1.key key1
,t2c1.sex sex1
,t2c1.state state1
,t2c1.valid_from valid_from1
,t2c1.valid_to valid_to1
,t2c2.rank rank2
,t2c2.key key2
,t2c2.sex sex2
,t2c2.state state2
,t2c2.valid_from valid_from2
,t2c2.valid_to valid_to2
from
type2_cols t2c1 inner join type2_cols t2c2
on t2c1.rank + 1 = t2c2.rank
and t2c1.key = t2c2.key
where
t2c1.sex <> t2c2.sex or t2c1.state <> t2c2.state
To view Figure 7 see PDF below.
We can see that we’re very close to what we’re looking for in the VALID_TO1 and VALID_FROM2 columns. But what we want (leaving out the irrelevant columns) is shown in Figure 8.

Picture (Metafile)
To get there, do a quasi-pivot of TYPE2_CHANGES, like this:
TYPE2_C_PIV
select * from
(
select
rank1 rank
,key1 key
,sex1 sex
,state1 state
,valid_to1 valid_date
from
type2_changes
union
select
rank2 rank
,key2 key
,sex2 sex
,state2 state
,valid_from2 valid_date
from
type2_changes
) s
order by rank
Picture (Metafile)
Add the handy-dandy, presaved NK_EARLIEST and NK_LATEST records back in, and order by rank:
TYPE2_C_ALL
select * from
(
select * from type2_c_piv
union
select
rank
,key
,sex
,state
,valid_from valid_date
from nk_earliest
union
select
rank
,key
,sex
,state
,IsNull(valid_to,’12/31/9999’) valid_date
from nk_latest
)
order by rank, valid_date
Picture (Metafile)

Put an ordinal or row number on these rows again:

Picture (Metafile)
The odd-numbered records contain the VALID_FROM dates, and the next-higher-numbered row contains the VALID_TO dates. Join ‘em, Dano. The following assumes the availability of a mod (modulus) operator to find the odd-numbered rows; there are certainly other ways to find odd numbers.

PRE_DIM
select
t1.key
,t1.sex
,t1.state
,t1.valid_date ‘valid_from’
,t2.valid_date ‘valid_to’
from
type2_c_all_ranked t1 inner join type2_c_all_ranked t2
on t1.new_rank + 1 = t2.new_rank
where
mod (t1.new_rank,2) = 1
Picture (Metafile)
The remainder is trivial. Join to the "latest" records on the natural key to pick up the type 1 dimension, and, while it’s handy, set the current flag:

Add_T1 ("add type 1")
select
pd.key as ‘nat_k’
,l1.first_name
,l1.ssn
,pd.sex
,pd.state
,pd.valid_from
,pd.valid_to
,case when pd.valid_from = l.valid_from then ‘Y’ else ‘N’ end as ‘current_flag’
from
pre_dim pd inner join nk_latest l
on pd.key = l.key
Then, generate surrogate keys for your target dimension table and load it.

Thursday, 25 February 2010

Bind variables - The key to application performance but disallows the potential for star transformations in a data warehouse.

If you've been developing applications on Oracle for a while, you've no doubt come across the concept of «Bind Variables». Bind variables are one of those Oracle concepts that experts frequently cite as being key to application performance, but it's often not all that easy to pin down exactly what they are and how you need to alter your programming style to use them.

To understand bind variables, consider an application that generates thousands of SELECT statements against a table; for example:

SELECT fname, lname, pcode FROM cust WHERE id = 674;
SELECT fname, lname, pcode FROM cust WHERE id = 234;
SELECT fname, lname, pcode FROM cust WHERE id = 332;

Each time the query is submitted, Oracle first checks in the shared pool to see whether this statement has been submitted before. If it has, the execution plan that this statement previously used is retrieved, and the SQL is executed. If the statement cannot be found in the shared pool, Oracle has to go through the process of parsing the statement, working out the various execution paths and coming up with an optimal access plan before it can be executed. This process is know as a «hard parse» and for OLTP applications can actually take longer to carry out that the DML instruction itself.

When looking for a matching statement in the shared pool, only statements that exactly match the text of the statements are considered; so, if every SQL statement you submit is unique (in that the predicate changes each time, from id = 674 to id=234 and so on) then you'll never get a match, and every statement you submit will need to be hard parsed. Hard parsing is very CPU intensive, and involves obtaining latches on key shared memory areas, which whilst it might not affect a single program running against a small set of data, can bring a multi-user system to it's knees if hundreds of copies of the program are trying to hard parse statements at the same time. The extra bonus with this problem is that contention caused by hard parsing is pretty much immune to measures such as increasing available memory, numbers of processors and so on, as hard parsing statements is one thing Oracle can't do concurrently with many other operations, and it's a problem that often only comes to light when trying to scale up a development system from a single user working on subset of records to many hundreds of users working on a full data set.

The way to get Oracle to reuse the execution plans for these statements is to use bind variables. Bind variables are «substituion» variables that are used in place of literals (such as 674, 234, 332) and that have the effect of sending exactly the same SQL to Oracle every time the query is executed. For example, in our application, we would just submit

SELECT fname, lname, pcode FROM cust WHERE id = :cust_no;

and this time we would be able to reuse the execution plan every time, reducing the latch activity in the SGA, and therefore the total CPU activity, which has the effect of allowing our application to scale up to many users on a large dataset.

Bind Variables in SQL*Plus

In SQL*Plus you can use bind variables as follows:

SQL> variable deptno number
SQL> exec :deptno := 10
SQL> select * from emp where deptno = :deptno;

What we've done to the SELECT statement now is take the literal value out of it, and replace it with a placeholder (our bind variable), with SQL*Plus passing the value of the bind variable to Oracle when the statement is processed. This bit is fairly straighforward (you declare a bind variable in SQL*Plus, then reference the bind variable in the SELECT statement)

Bind Variables in PL/SQL

Taking PL/SQL first of all, the good news is that PL/SQL itself takes care of most of the issues to do with bind variables, to the point where most code that you write already uses bind variables without you knowing. Take, for example, the following bit of PL/SQL:

create or replace procedure dsal(p_empno in number)
as
begin
update emp
set sal=sal*2
where empno = p_empno;
commit;
end;
/

Now you might be thinking that you've got to replace the p_empno with a bind variable. However, the good news is that every reference to a PL/SQL variable is in fact a bind variable.

Dynamic SQL

In fact, the only time you need to consciously decide to use bind variables when working with PL/SQL is when using Dynamic SQL.

Dynamic SQL, allows you to execute a string containing SQL using the EXECUTE IMMEDIATE command. For next example would always require a hard parse when it is submitted:

create or replace procedure dsal(p_empno in number)
as
begin
execute immediate
'update emp set sal = sal*2 where empno = '||p_empno;
commit;
end;
/

The way to use bind variables instead is to change the EXECUTE IMMEDIATE command as follows:

create or replace procedure dsal(p_empno in number)
as
begin
execute immediate
'update emp set
sal = sal*2 where empno = :x' using p_empno;
commit;
end;
/

And that's all there is to it. One thing to bear in mind, though, is that you can't substitute actual object names (tables, views, columns etc) with bind variables - you can only subsitute literals. If the object name is generated at runtime, you'll still need to string concatenate these parts, and the SQL will only match with those already in the shared pool when the same object name comes up. However, whenever you're using dynamic SQL to build up the predicate part of a statement, use bind variables instead and you'll reduce dramatically the amount of latch contention going on.

The Performance Killer

Just to give you a tiny idea of how huge of a difference this can make performance wise, you only need to run a very small test:

Here is the Performance Killer ....

SQL> alter system flush shared_pool;
SQL> set serveroutput on;

declare
type rc is ref cursor;
l_rc rc;
l_dummy all_objects.object_name%type;
l_start number default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name
from all_objects
where object_id = ' || i;
fetch l_rc into l_dummy;
close l_rc;
-- dbms_output.put_line(l_dummy);
end loop;
dbms_output.put_line
(round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
101.71 Seconds...

... and here is the Performance Winner:

declare
type rc is ref cursor;
l_rc rc;
l_dummy all_objects.object_name%type;
l_start number default dbms_utility.get_time;
begin
for i in 1 .. 1000
loop
open l_rc for
'select object_name
from all_objects
where object_id = :x'
using i;
fetch l_rc into l_dummy;
close l_rc;
-- dbms_output.put_line(l_dummy);
end loop;
dbms_output.put_line
(round((dbms_utility.get_time-l_start)/100, 2) ||
' Seconds...' );
end;
/
1.9 Seconds...

That is pretty dramatic. The fact is that not only does this execute much faster (we spent more time PARSING our queries then actually EXECUTING them!) it will let more users use your system simultaneously.

Bind Variables in VB, Java and other applications

The next question is though, what about VB, Java and other applications that fire SQL queries against an Oracle database. How do these use bind variables? Do you have to in fact split your SQL into two statements, one to set the bind variable, and one for the statement itself?

In fact, the answer to this is actually quite simple. When you put together an SQL statement using Java, or VB, or whatever, you usually use an API for accessing the database; ADO in the case of VB, JDBC in the case of Java. All of these APIs have built-in support for bind variables, and it's just a case of using this support rather than just concatenating a string yourself and submitting it to the database.

For example, Java has PreparedStatement, which allows the use of bind variables, and Statement, which uses the string concatenation approach. If you use the method that supports bind variables, the API itself passes the bind variable value to Oracle at runtime, and you just submit your SQL statement as normal. There's no need to separately pass the bind variable value to Oracle, and actually no additional work on your part. Support for bind variables isn't just limited to Oracle - it's common to other RDBMS platforms such as Microsoft SQL Server, so there's no excuse for not using them just because they might be an Oracle-only feature.

Lastly, it's worth bearing in mind that there are some instances where bind variables are probably not appropriate, usually where instead of your query being executed many times a second (as with OLTP systems) your query in fact actually takes several seconds, or minutes, or hours to execute - a situation you get in decision support and data warehousing. In this instance, the time taken to hard parse your query is only a small proportion of the total query execution time, and the benefit of avoiding a hard parse is probably outweighed by the reduction in important information you're making available to the query optimizer - by substituting the actual predicate with a bind variable, you're removing the ability for the optimiser to compare your value with the data distribution in the column, which might make it opt for a full table scan or an index when this isn't appropriate. Oracle 9i helps deal with this using a feature known as bind variable peeking, which allows Oracle to look at the value behind a bind variable to help choose the best execution plan.

Another potential drawback with bind variables and data warehousing queries is that the use of bind variables disallows the potential for star transformations, taking away this powerful option for efficiently joining fact and dimension tables in a star schema.

Wednesday, 10 February 2010

How should you approach a new data profiling engagement?

How should you approach a new data profiling engagement?

Data profiling is best scheduled prior to system design, typically occurring during the discovery or analysis phase. The first step -- and also a critical dependency -- is to clearly identify the appropriate person to provide the source data and also serve as the “go to” resource for follow-up questions. Once you receive source data extracts, you’re ready to prepare the data for profiling. As a tip, loading data extracts into a database structure will allow you to freely write SQL to query the data while also having the flexibility to use a profiling tool if needed.
When creating or updating a data profile, start with basic column-level analysis such as:
Distinct count and percent: Analyzing the number of distinct values within each column will help identify possible unique keys within the source data (which I’ll refer to as natural keys). Identification of natural keys is a fundamental requirement for database and ETL architecture, especially when processing inserts and updates. In some cases, this information is obvious based on the source column name or through discussion with source data owners. However, when you do not have this luxury, distinct percent analysis is a simple yet critical tool to identify natural keys.
Zero, blank, and NULL percent: Analyzing each column for missing or unknown data helps you identify potential data issues. This information will help database and ETL architects set up appropriate default values or allow NULLs on the target database columns where an unknown or untouched (i.e.,., NULL) data element is an acceptable business case. This analysis may also spawn exception or maintenance reports for data stewards to address as part of day-to-day system maintenance.
Minimum, maximum, and average string length: Analyzing string lengths of the source data is a valuable step in selecting the most appropriate data types and sizes in the target database. This is especially true in large and highly accessed tables where performance is a top consideration. Reducing the column widths to be just large enough to meet current and future requirements will improve query performance by minimizing table scan time. If the respective field is part of an index, keeping the data types in check will also minimize index size, overhead, and scan times.
Numerical and date range analysis: Gathering information on minimum and maximum numerical and date values is helpful for database architects to identify appropriate data types to balance storage and performance requirements. If your profile shows a numerical field does not require decimal precision, consider using an integer data type because of its relatively small size. Another issue which can easily be identified is converting Oracle dates to SQL Server. Until SQL Server 2008, the earliest possible datetime date was 1/1/1753 which often caused issues in conversions with Oracle systems.
With the basic data profile under your belt, you can conduct more advanced analysis such as:
Key integrity: After your natural keys have been identified, check the overall integrity by applying the zero, blank, and NULL percent analysis to the data set. In addition, checking the related data sets for any orphan keys is extremely important to reduce downstream issues. For example, all customer keys from related transactions (e.g., orders) should exist in the customer base data set; otherwise you risk understating aggregations grouped by customer-level attributes.
Cardinality: Identification of the cardinality (e.g. one-to-one, one-to-many, many-to-many, etc.) between the related data sets is important for database modeling and business intelligence (BI) tool set-up. BI tools especially need this information to issue the proper inner- or outer-join clause to the database. Cardinality considerations are especially apparent for fact and dimension relationships.
Pattern, frequency distributions, and domain analysis: Examination of patterns is useful to check if data fields are formatted correctly. As example, you might validate e-mail address syntax to ensure it conforms to user@domain. This type of analysis can be applied to most columns but is especially practical for fields that are used for outbound communication channels (e.g., phone numbers and address elements). Frequency distributions are typically simple validations such as “customers by state” or “total of sales by product” and help to authenticate the source data before designing the database. Domain analysis is validation of the distribution of values for a given data element. Basic examples of this include validating customer attributes such as gender or birth date, or address attributes such as valid states or provinces within a specified region. Although these steps may not play as critical a role in designing the system, they are very useful for uncovering new and old business rules.
Picking the right techniques depends on the project objectives. If you’re building a new database from scratch, take the time to execute and review outcomes of each of the above bullet points. If you’re simply integrating a new data set into an existing database, select the most applicable tasks that apply to your source data.
All of these steps may be conducted by writing raw SQL. The basic profiling steps can usually be accomplished using a tool designed specifically for data profiling. Many third-party data profiling tools have been introduced into the marketplace over the last several years to help streamline the process. These tools typically allow the user to point to a data source and select the appropriate profile technique(s) to apply. The outputs of these tools vary, but usually a data source summary is produced with the field level profile statistics.

Furthermore, understanding the available data, the missing data, and the required data can help map out future technical strategies and data capture methods. Using this information to improve data capture techniques will improve the source data integrity and may lead to further improvements in overall customer engagement and intelligence.
The data profiling process may seem arduous and less than glamorous at times, but it is an important step which adds value to any database project.

Wednesday, 3 February 2010

What is Security Clearance in the UK?

What is Security Clearance?
National Security Vetting is carried out so that people can work or carry out tasks, which in the course of doing so, requires national security clearance. Government organisations including the Ministry of Defence, Central Government, Defence Estates and the Armed Forces require Security Cleared personnel as well as Companies in the private sector contracted to undertake work for these bodies. Security Clearance levels vary depending upon the sensitivity of the information that is accessed.
The main Security Clearing bodies are:
The Defence Vetting Agency (DVA)
Foreign and Commonwealth Office (FCO)
Metropolitan Police Service (MPS)
How to obtain Security Clearance
You cannot apply for Security Clearance as an individual. Clearance is requested by an employer and carried out by Government agencies. Security Clearance is granted for a specific period of time depending on the employment term or for a particular project.
Security Clearance can be verified and transferred to a new employer if required. If you do not have the Security Clearance required for a particular role you would not be able to start your employment until clearance has been obtained.
You do not have to be a British National in order to gain Security Clearance, but you will have to meet the following criteria depending on the level of clearance required.
There are four main types of national security vetting and clearances:
Developed Vetting (DV) This is the highest level of Security Clearance and is required for people with substantial unsupervised access to TOP SECRET assets, or for working in the intelligence or security agencies. This level of clearance involves Security Check (SC) and, in addition, completion of a (DV) questionnaire, financial checks, checking of references and a detailed interview with a vetting officer. To gain (DV) clearance you will normally have had to have been a resident in the UK for a minimum of 10 years.
Security Check (SC) is for people who have substantial access to SECRET, or occasional access to TOP SECRET assets and information. This level of clearance involves a (BPSS) check plus UK criminal and security checks and a credit check. To gain (SC) clearance you will normally have had to have been a resident in the UK for a minimum of 5 years.
Counter Terrorist Check (CTC) is required for personnel whose work involves close proximity to public figures, gives access to information or material vulnerable to terrorist attack or involves unrestricted access to certain government or commercial establishments. A (CTC) does not allow access, or knowledge, or custody, of protectively marked assets and information. The check includes a Baseline Personnel Security Standard Check (BPSS) and also a check against national security records. To gain (CTC) clearance you will normally have had to have been a resident in the UK for a minimum of 3 years.

Baseline Personnel Security Standard (BPSS) (formally Basic Check)and Enhanced Baseline Standard (EBS) (formerly Enhanced Basic Check or Basic Check +): These are not formal security clearances; they are a package of pre-employment checks that represent good recruitment and employment practice.
A BPSS or EBS aims to provide an appropriate level of assurance as to the trustworthiness, integrity, and probable reliability of prospective employees. The check is carried out by screening identity documents and references.
Other Security checks and clearances:
NATO has four levels of security classification; NATO RESTRICTED (NR), NATO CONFIDENTIAL (NC), NATO SECRET (NS) and COSMIC TOP SECRET (CTS) UMBRA.
NATO's clearance levels function independent of any clearance levels for other nations. However, it is understood that for most NATO nations, granting of a NATO security clearance is handled in a similar manner to that of obtaining a national security clearance.
MPS Vetted
Metropolitan Police Vetting is carried out for all members of the Metropolitan Police Service (police officers, police staff and members of the specials constabulary) Non Police Personnel including Contactors, Contractors representatives, consultants, volunteers and any person who requires unescorted access to MPS premises or uncontrolled access to police information.
The MPS has the following Force Vetting levels:
• Initial Vetting Clearance (IVC)
• Management Vetting (MV)
SIA
The Security Industry Authority operates the compulsory licensing of individuals working in specific sectors of the private security industry within the UK.
The activities licensed under the Private Security Industry 2001 are:
Manned guarding, which includes:
Cash and Valuables in Transit
Close Protection
Door Supervision
Public Space Surveillance (CCTV)
Security guard
Immobilisation, restriction and removal of vehicles
Key Holding
Criminal Records Bureau (CRB) clearance is required for posts that involve working with children or vulnerable adults. Standard Disclosures may also be issued for people entering certain professions, such as members of the legal and accountancy professions. Standard Disclosures contain the following; details of all convictions, cautions, reprimands and warnings held on the Police National Computer (PNC);
Enhanced CRB checks are required for posts involving a far greater degree of contact with children or vulnerable adults involving regular caring for, supervising, training or being in sole charge of such people i.e. Teacher, Scout or Guide leader. Enhanced Disclosures contain the same information as the Standard Disclosures but with the addition of local police force information considered relevant by Chief Police Officer(s).

There are three official criminal record disclosure services within the UK:
CRB provides a service for England & Wales.
Disclosure Scotland is a service provided to manage and operate the Disclosure service in Scotland. Disclosures give details of an individual’s criminal convictions (and in the case of Enhanced Disclosures, where appropriate, non-conviction information).
AccessNI provides a service for Northern Ireland with Disclosures at Basic, Standard and Enhanced levels

Q & A

What is Security Clearance?
Personnel Security vetting is carried out so that people may take certain jobs or carry out tasks that need a national security clearance. These jobs and tasks are located throughout the Ministry of Defence and Armed Forces, as well as in the private sector dealing with defence related work. In addition, a number of other government departments and organisations require Security Clearance.
How do I get a Security Clearance?
First you need a sponsor. Individuals and companies cannot ask for a security clearance unless they are sponsored, and you will not be sponsored unless they are contracted (or are in the process of being contracted) to work on one or more specific MOD classified projects.
For large contracts, an officer in the Defence Procurement Agency (DPA) or Defence Logistics Organisation (DLO) - typically a Project Officer will be your sponsor. For staff in sub-contracted organisations, sponsorship will be provided through the prime contractor.
Why does MOD insist on having sponsors for security clearances? Why can't I just apply for a security clearance?
A security clearance provides a certain level of assurance at a point in time, as to an individual's suitability to have trusted access to sensitive information.
It does not provide a guarantee of future reliability, and all security clearances are kept under review to ensure that the necessary level of assurance is maintained. This review is carried out by Government Departments and Government-sponsored contractors, who are responsible for the oversight and aftercare of individuals, granted a security clearance.
The main types of checks and clearances are listed below and are processed by the following Governments agencies:
Defence Vetting Agency (DVA)
Foreign and Commonwealth Office (FCO)
Metropolitan Police Service (MPS)
National Security Vetting
Developed Vetting (DV) or (DV Cleared) is required for people with substantial unsupervised access to TOP SECRET assets. The following security vetting stages are mandatory before a DV clearance can be approved:
BaselinePersonnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Credit Reference Check and review of personal finances
Security Service Check
Check of medical and psychological information provided
Subject Interview and further enquiries, which will include interviews with character referees and current and previous supervisors
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a DV clearance.
Once a clearance is granted, it is only valid for a pre-determined period after which a review must be conducted if the clearance is still required. The time interval before a review is required is specified in guidance issued by the cabinet office.
A small number of clearances are granted in spite of some reservations. Risk management requires follow-up work and monitoring of some cases. This activity is termed "aftercare", and may be required in connection with any of the above clearances.
Security Check (SC) or (SC Cleared) is required for people who have substantial access to SECRET or occasional controlled access to TOP SECRET assets. The following security vetting stages comprise a full SC clearance:
Baseline Personnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Credit Reference Check
Security Service Check
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a SC clearance.
Counter Terrorist Check (CTC) or (CTC Cleared) is required for people who work in close proximity to public figures, or who have access to information or material vulnerable to terrorist attack, or involves unrestricted access to government or commercial establishments assessed to be at risk from terrorist attack. A CTC does not allow access to, or knowledge or custody of, protectively marked assets, but the baseline Personnel Security Standard which is carried out on all MOD personnel and contractors, allows a degree of access. The following security vetting stages are mandatory before a CTC clearance can be approved:
Baseline Personnel Security Standard (Which is normally undertaken as part of the recruiting process)
Departmental / Company Records Check
Security Questionnaire
Criminal Record Check
Security Service Check
On completion of the vetting process, the information collected is assessed and a decision made to refuse or approve a CTC clearance.
What are Employment Checks?
Baseline Personnel Security Standard (BPSS) (formally Basic Check) and Enhanced Baseline Standard (EBS) (formerly Enhanced Basic Check or Basic Check +): These are not formal security clearances; they are a package of pre-employment checks that represent good recruitment and employment practice. A BPSS or EBS aims to provide an appropriate level of assurance as to the trustworthiness, integrity, and probable reliability of prospective employees and should be applied to:
All successful applicants for employment in the public sector and Armed Forces (both permanent and temporary)
All private sector employees working on government contracts (e.g. contractors and consultants), who require access to, or knowledge of, government assets protectively marked up to and including CONFIDENTIAL.
BPSS and EBS are normally conducted by the recruitment authorities or companies to the agreed standard, and because they underpin the national security vetting process it is vital that they are carried out properly and thoroughly and before any further vetting is completed.
Employment Assurance (disclosures) (EA (D)) are required by people from MOD sponsored units and organisations that benefit the MOD, who are being considered for employment with children or vulnerable adults. DVA acts as a co-ordinator for these requests.

Why is the National Security Vetting (NSV)
System necessary and what does it aim to achieve?
The UK needs a security system to protect against threats from hostile intelligence services, terrorists and other pressure groups. Vetting ensures that anyone who goes through the process can be trusted with sensitive government information or property.
Who is affected?
The system applies to people in the following categories whose employment involves access to sensitive Government assets, Crown servants, members of the security and intelligence agencies; members of the armed forces; the police; employees of certain other non-government organisations that are obliged to comply with the Government’s security procedures; employees of contractors providing goods and services to the Government.
How does the vetting system work?
Candidates for jobs that provide access to sensitive information or sites are asked to complete one or more security questionnaires, which invite them to provide the personal details needed to enable the necessary checks to be carried out. Interviews may also be carried out. The depth of checks varies according to the level of regular access to sensitive information that the job entails.
How confidential is the vetting process?
All personal information gathered during the vetting process is handled in the strictest of confidence by the vetting agencies. These bodies include The Defence Vetting Agency (DVA), The Foreign and Commonwealth Office (FCO) and the Metropolitan Police Service (MPS). In a very small number of cases, where serious risks have been identified, a case may be discussed with the Ministry of Defence, security and personnel authorities. In an even smaller number of cases, and only where the person being vetted agrees, line management may be given some relevant information and be asked to help manage the risk. There is an extremely remote possibility of disclosure of vetting information in connection with criminal or civil proceedings.
How do I get a security clearance?
Individuals and companies cannot ask for a Security Clearance unless they are sponsored, and you will not be sponsored unless they are contracted (or are in the process of being contracted) to work on one or more specific MOD / Government classified projects. For large contracts, an officer in the Defence Procurement Agency (DPA) or Defence Logistics Organisation (DLO) - typically a Project Officer will be your sponsor. For staff in sub-contracted organisations, sponsorship will be provided through the prime contractor.
Why can't I just apply for a security clearance?
Security Clearance provides a certain level of assurance, at a point in time, as to an individual’s suitability to have trusted access to sensitive information. It does not provide a guarantee of future reliability, and all security clearances are kept under review to ensure that the necessary level of assurance is maintained. This review is carried out by Government Departments and Government-sponsored contractors, who are responsible for the oversight and aftercare of individuals, granted a security clearance. This would not be possible in the case of private individuals. Security Vetting / Security Clearance is carried out to the following levels by approved government bodies.
Security Vetting / Security Clearance is carried out to the following levels by approved government bodies.
Levels of UK Clearance:
DV / Developed Vetting (MOD)
SC / Security Check
CTC / Counter Terrorist Check
EBS / Enhanced Baseline Standard
BPSS / Basic Personnel Security Standard
NATO / NATO Cleared
MPS / Metropolitan Police Service
SIA / Security Industry Authority
ECRB / Enhanced Criminal Records Bureau
CRB / Criminal Records Bureau

*** I am SC Cleared and working on such a program **

Tuesday, 2 February 2010

Avatar - The computing and data centre behind making of

The computing and data centre behind making of Avatar

A palm-swept suburb of Wellington, New Zealand is not the first place you'd look for one of the most powerful purpose-built data centers in the world. Yet Miramar, pop. 8,334, is home to precisely that, along with a huge campus of studios, production facilities and soundstages.
The compound is a project that began 15 years ago, inspired by filmmakers Peter Jackson, Richard Taylor and Jamie Selkirk. The studios have since been the main location for creating The Lord of the Rings movies, King Kong, and several others.

Right in the middle sits Weta Digital, the increasingly famous visual effects production house behind high-end commercials and blockbuster movies, most lately the $230 million James Cameron extravaganza AVATAR.
Despite the locale, Weta has drawn plenty of attention. Five Academy Award nominations and four Oscars will do that, but publicist Judy Alley says nothing has matched the buzz of AVATAR. “We’ve done more than 100 interviews in the last few months,” Alley says. With most of the attention focused on the movie’s immersive look, Alley was glad someone was interested to look at the technology installation that sits within Weta and kindly connected us to two of the people who make it run.
As they explained, what makes Weta and a project like AVATAR work is in equal parts the computing power of the data center that creates the visual effects, and the data management of artistic processes that drive the computing.
Hot Gear
Weta Digital is really a visual effects job shop that manages thousands of work orders of intense amounts of data. That preselects most of the fast, constant capacity equipment required. The data center used to process the effects for AVATAR is Weta’s 10,000 square foot facility, rebuilt and stocked with HP BL2x220c blades in the summer of 2008.
The computing core - 34 racks, each with four chassis of 32 machines each - adds up to some 40,000 processors and 104 terabytes of RAM. The blades read and write against 3 petabytes of fast fiber channel disk network area storage from BluArc and NetApp.
All the gear sits tightly packed and connected by multiple 10-gigabit network links. “We need to stack the gear closely to get the bandwidth we need for our visual effects, and, because the data flows are so great, the storage has to be local,” says Paul Gunn, Weta’s data center systems administrator.
That ruled out colocation or cloud infrastructure, leaving Gunn as a sort of owner-operator responsible for keeping the gear running. It also required some extra engineering for the hardware because the industry standard of raised floors and forced-air cooling could not keep up with the constant heat coming off the machines churning out a project like AVATAR.
Heat exchange for an installation like Weta’s has to be enclosed, water cooled racks where the hot air is sucked into a radiator and cycled back through the front of the machines. “Plus,” Gunn says, “we run the machines a bit warm, which modern gear doesn’t mind, and the room itself is fairly cool.”

With building costs absorbed, water cooling becomes much less expensive than air conditioning, and the engineering in the data center allows fine tuning. “I don’t want to give you an exact figure,” says Gunn, “but we’re talking tens of thousands of dollars saved by changing the temperature by a degree.”
Because of passive heat exchangers and the local climate, Weta pays no more than the cost of running water pumps to get rid of heat for all but a couple months a year. Just weeks ago, Weta won an energy excellence award for building a smaller footprint that came with 40 percent lower cooling costs for a data center of its type.
Throughput Revisited
The other half of Weta Digital’s processing story comes from the intense visual effect activities that heat up the data center.
Weta Digital is actually two companies, Weta Workshop, where a crew of artists and craftsmen create physical models, and the like-named Weta Digital, which creates digital effects for commercials, short films and blockbuster movies.
"If it's something that you can hold in your hand, it comes from Weta Workshop," says Gunn, "whereas if it's something that doesn't exist, we'll make it."
In the visual effects process, a mix of inputs come from storyboards, director revisions and tweaking by internal and external digital artists who turn a director’s concept into an image via 3D software from Maya or Pixar’s RenderMan. Artists work through concepts, and iterate versions to get movement and lighting just right. It’s nothing the movie industry hasn’t done all along, says Gunn, only now the tools are different and more data intensive.
The main activity in a visual effects data center is called rendering, the process of turning the digital description of an image into an actual image that can be saved to disk and eventually written to film or another media.
The banks of computers are called render walls, where Joe Wilkie serves as Weta’s “manager wrangler,” the person who oversees the data flow and feeds jobs through the pipeline.
“Wrangler” is a traditional but still common film industry term that first referred to the people who herded the horses and other livestock in Western movies. Likewise, Wilkie says he’s most often called a “render wrangler,” in this case someone who rounds up digital files rather than cattle. “Each part of a movie is an individual item, and it all has to be put together,” he says. “So when an artist is working on a shot, they will hit a button that launches a job on the render wall and loads it into our queueing system.”
The queueing system is a Pixar product called Alfred, which creates a hierarchical job structure or tree of multiple tasks that have to run in a certain order. In any single job, there might be thousands of interdependent tasks. As soon as CPUs on the render wall are freed up, new tasks are fired at idle processors.
At the peak of AVATAR, Wilkie was wrangling more than 10,000 jobs and an estimated 1.3 to 1.4 million tasks per day. Each frame of the 24 frame-per-second movie saw multiple iterations of back and forth between directors and artists and took multiple hours to render.

For Gunn’s data center, that added up to processing seven or eight gigabytes of data per second, a job that ran 24 hours a day for the last month or more of production. It’s a Goldilocks task of keeping the gear running fast, “not too fast, not too slow, but just right,” Gunn says, to keep the production on schedule. “It’s a complex system and when you’re on deadline with a project like this, you really want to make sure the lights stay on.”
A final film copy of AVATAR is more humble than all the back and forth that occurred in its creation: at 12 megabytes per frame, each second stamped onto celluloid amounts to 288 megabytes or 17.28 gigabytes per minute. Deduct the credits from the 166-minute movie and you understand better what the final file consists of.
But the immersive effect of AVATAR comes from the many hours or days of attention to each of about 240,000 frames that go into the final product. Weta asked us to mention vfx supervisor Joe Letteri, who oversaw the interactions of directors, a half-dozen lead concept artists and the supporting artists who made the technology process so intensive and powerful.

Sunday, 31 January 2010

OLS with BO Security Model and LDAP in a Data Warehouse

The following are the Business Objects recommendations when creating a security model (see Business Objects Enterprise Administrator’s Guide). This security model should follow these recommendations.
• Use access levels wherever possible. These predefined sets of rights simplify administration by grouping together rights associated with common user needs.
• Set rights and access levels on top-level folders. Enabling inheritance will allow these rights to be passed down through the system with minimal administrative intervention.
• Avoid breaking inheritance whenever possible. By doing so, you can reduce the amount of time it takes to secure the content that you have added to BusinessObjects Enterprise.
• Set appropriate rights for users and groups at the folder level, then publish objects to that folder. By default, users or groups who have rights to a folder will inherit the same rights for any object that you subsequently publish to that folder.
This recommendation cannot be followed due to the access requirements of the reports in the RBAC matrix. In general rights are applied to objects (reports) within the folders.
• Organize users into user groups, assign access levels and rights to the entire group, and assign access levels and rights to specific members when necessary.
• Create individual Administrator accounts for each administrator in the system and add them to the Administrators group to improve accountability for system changes.

What I have found in this installation of Business Objects + Oracle 10g + Oracle Streams + Oracle Label Security is that the OLS labels are not really doing anything additional than what the above BO security model implementation along with LDAP is doing.

I want to discuss and hear more successful implementations of OLS with LDAP and BO Security Models. Any case studies?

Tuesday, 6 October 2009

A check list for NRIS buying a house in India

A check list for NRIS buying a house in India

Investment from any source in the housing sector is an appreciated aspect in today's indian real estate scenario, from an industry perspective. Let it be Resident Indians, NRIs or even companies, constructing houses creates jobs for a lot of people. A back of the napkin calculation shows that for a 1000 sq feet house, 100 direct employment (architect, building engineer, masons, helpers, electricians, plumbers, painters, carpenters, etc) and over 1000 indirect employment (people working in cement plant, brick kilns, tiles kilns, electrical fittings companies, saw mills, steel plants, paint companies, etc) opportunities are created. Of course the duration of the employment will depend on a number of factors like nearness to a supply sources for material and labour, access to high tech equipment, architecture, etc.

But to construct a house is not all that easy. It is not without substance that a Tamil saying goes, "Veetai Katti Par, Kalyanathai Panni Par" (Basically the saying rates constructing a house and having a child's marriage done among the toughest).

To an already difficult task, the sheer distance and absence during construction become problem multipliers for the NRIs. There was an NRI based out of the USA who got a wonderful sales pitch from a builder. The salesman met the NRI at his office in the USA and arranged for all the documentation and also sent video clippings of the apartment at Bangalore. Convinced on the genuineness, the NRI transferred Rs.50 lakhs to the builder's account. The date for the house warming was fixed after one month. The NRI could not make it to the function due to a pressing office work and had asked his parents to do the poojas.

The parents got the shock of their life, when they landed at the apartment complex the day before the poojas. The complex had only one sample apartment finished (the one in the video). They were told by the Project Manager at the site that the poojas can be done at anytime but the apartment can be delivered only after "6 months".

Another NRI who was building the house himself using an experienced and well referenced engineer found that his house orientation has been shifted by 15 feet. This left him space on the wrong side of the house squashing his plans to build a small commercial complex in future. They now have space for parking 4 cars but none for building a rent worthy space !!!

A number of checks could have been used to be on the safer side in both the above cases:

Thankfully there are a number of professional builders who are a lot more trust worthy. So doing a bit of research on the track record of a builder can help.

For any real estate purchase it is preferable to make visits to the sites before buying them. This exercise is worth it not only because we are committing a large amount of money but also because reversing the decision proves costly as well. If the NRI is not able to make it, he can request a trusted friend or relative to opt for the site visit.

Going for a housing loan through a bank will ensure that the money is released in stages only. This keeps the money safe during the construction. Also, all the banks at their local branches, have their list of shortlisted builders for whose constructions loans are pre-approved. It is better to buy only these constructions, as the banks are quite stringent in their norms for pre-approval and shortlist only those builders who have a proven track record and those project, which comply to all legal norms.

Post the construction, the management of the asset is one of the major issues faced by NRIs. There is no easy solution for this. There are some society associations which support the owners of the buildings with services like maintenance and rent collection. There are again the "friendly neighbor hood real estate agents" who may some times double up as the maintenance manager too. Many times though the "friendly" turn into "greedy" after some time. There are a few professional real estate management firms in most metros, who are now expanding into the 2nd Tier cities too.

Whether the construction rate quoted is for Built-up area or Carpet area? Construction is generally quoted for built-up area and rental is quoted only for the carpet area. There can be a difference of 15 % to 20% between the two based on the type of construction. Today in apartments there is the concept of super built-up area which apart from the built-up area includes stair case, common passages, fire escape passage, etc. The super built-up area can be bloated by as much as 50% of the carpet area.

Robert Allen, the Real Estate Mogul suggests the 100 - 20 - 10 - 1 rule for any real estate purchase. The idea is to check out 100 properties in person; shortlist 20 of them for a deeper scrutiny; enter into negotiation with sellers for 10 of the properties and finally buy the ONE that is best suited.

Technically there should be a check for all the statutory approvals - town planning (Nobody wants a flyover at arms length from the balcony!), water supply and sewage disposal, safety approval from the local fire department, etc. It is always better to ask for the encumbrance certificate and the title deed from the builder to get a legal opinion from a lawyer.

Don't hesitate to ask. This is probably the most important point. Many times, for avoiding being thought of as less intelligent, we question less. For any investing and particularly for real estate the more the questions asked the better the investment. The genuineness of the promoter can be gauged by the patience, the promptness and depth of the answers. Answers like, "Don't worry about that, we will manage", without going into the specifics are danger signs.

Take time. Do not restrain yourself by limiting the time for checking the properties and decision making to the time that you are present in India. A 2 to 4 week holiday cannot be hoped to be converted into a real estate investment period. Start the process before you come here. In case you cannot decide before you leave, it is OK. A Power of Attorney to a parent or a relative can be used to decide on the actual purchase even after you leave the shores of India.
Source: ET

Monday, 7 September 2009

Virga

Did you know that rain can evaporate before it hits the ground? In meteorology, virga is an observable streak or shaft of precipitation that falls from a cloud but evaporates before reaching the ground. Virga can cause varying weather effects, because as rain is changed from liquid to vapor form, it removes heat from the air due to the high heat of vaporization of water. In some instances, these pockets of colder air can descend rapidly, creating a dry microburst which can be extremely hazardous to aviation. Conversely, precipitation evaporating at high altitude can compressionally heat as it falls, and result in a gusty downburst which may substantially and rapidly warm the surface temperature. Virga can produce dramatic and beautiful scenes, especially during a red sunset. The word virga is derived from Latin, twig or branch. A backronym sometimes found in amateur discussions of meteorology is "Variable Intensity Rain Gradient Aloft."http://en.wikipedia.org/wiki/Virga See pic inhttp://imgur.com/gUZof.jpg and details in http://www.reddit.com/r/pics/comments/9h5ei/rain_rain_go_away/